删除或替换列名中的空格

如何将数据框列名中的空格替换为“ _”?

['join_date' 'fiscal_quarter' 'fiscal_year' 'primary_channel'
'secondary_channel' 'customer_count' 'new_members' 'revisit_next_day'
'revisit_14_day' 'demand_1yr' 'revisit_next_day_rate'
'revisit_14_day_rate' 'demand_1yr_per_new_member' u'ch_Ad Network'
u'ch_Affiliate' u'ch_Branded SEM' u'ch_DSP' u'ch_Daily Email'
u'ch_Daily Messaging' u'ch_Direct' u'ch_Direct Publisher' u'ch_Email'
u'ch_Feeds' u'ch_Native' u'ch_Non-Branded SEM' u'ch_Organic Search'
u'ch_Paid Social' u'ch_Site' u'ch_Special Email' u'ch_Television'
u'ch_Trigger Email' u'ch_UNMAPPED' u'ch_Unpaid Social' u'quarter_Q2'
u'quarter_Q3' u'quarter_Q4']
244255 次浏览
  • To remove white spaces:
  1. To remove white space everywhere:
df.columns = df.columns.str.replace(' ', '')
  1. To remove white space at the beginning of string:
df.columns = df.columns.str.lstrip()
  1. To remove white space at the end of string:
df.columns = df.columns.str.rstrip()
  1. To remove white space at both ends:
df.columns = df.columns.str.strip()
  • To replace white spaces with other characters (underscore for instance):
  1. To replace white space everywhere
df.columns = df.columns.str.replace(' ', '_')
  1. To replace white space at the beginning:
df.columns = df.columns.str.replace('^ +', '_')
  1. To replace white space at the end:
df.columns = df.columns.str.replace(' +$', '_')
  1. To replace white space at both ends:
df.columns = df.columns.str.replace('^ +| +$', '_')

All above applies to a specific column as well, assume you have a column named col, then just do:

df[col] = df[col].str.strip()  # or .replace as above

Commands can be chained

df.columns = df.columns.str.strip().str.replace(' ', '_')

Python string methods are extremely fast, and can be used in a list comprehension to fix column names:

# replace white spaces by underscores
df.columns = [c.replace(' ', '_') for c in df]


# strip leading white spaces
df.columns = [c.lstrip() for c in df]


# strip trailing white spaces
df.columns = [c.rstrip() for c in df]


# replace leading white spaces by underscores
df.columns = ['_' + c.lstrip() for c in df]

or map strip methods:

# strip leading white spaces
df.columns = list(map(str.lstrip, df))

Since pandas' vectorized string methods (pandas.Index.str and pandas.Series.str) aren't optimized, using Python string methods in a comprehension is usually faster, especially if you need to chain them.

For example, for 100k column names, if you need to chain 3 methods together, Python string methods are 2-5 times faster than equivalent pandas methods.

n = 100_000
df = pd.DataFrame([range(n)], columns=[f" {i} {j} " for i,j in zip(range(n), range(n, 0, -1))])


%timeit df.set_axis(df.columns.str.replace('^ +', 'S', regex=True).str.replace(' +$', 'E', regex=True).str.replace(' ', '_'), axis=1)
# 331 ms ± 2.21 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)


%timeit df.set_axis('S' + df.columns.str.strip().str.replace(' ', '_') + 'E', axis=1)
# 118 ms ± 3.66 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)


%timeit df.set_axis(['S' + c.strip().replace(' ', '_') + 'E' for c in df], axis=1)
# 68 ms ± 5.09 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)