I have a pandas dataframe with a two-level column index. It's read in from a spreadsheet where the author used a lot of whitespace to accomplish things like alignment (for example, one column is called 'Tank #'
).
I've been able to remove the whitespace on the levels individually...
level0 = df.columns.levels[0].str.replace('\s', '', regex=True)
level1 = df.columns.levels[1].str.replace('\s', '', regex=True)
df.columns.set_levels([level0, level1], inplace=True)
...but I'm curious if there is a way to do it without having to change each individual level one at a time.
I tried
df.columns.set_levels(df.columns.str.replace('\s', '', regex=True))
but got AttributeError: Can only use .str accessor with Index, not MultiIndex
.
Here is a small sample subset of the data.
Run Info Run Data
run # Tank # Step A\npH concentration
0 6931 5 5.29 33.14
1 6932 1 5.28 33.13
2 6933 2 5.32 33.4
3 6934 3 5.19 32.98
4 6935 4 5.28 32.7
>>> df.to_dict('list')
{('Run Info', 'run #'): [6931, 6932, 6933, 6934, 6935],
('Run Info', 'Tank #'): [5, 1, 2, 3, 4],
('Run Data', 'Step A\npH'): [5.29, 5.28, 5.32, 5.19, 5.28],
('Run Data', 'concentration'): [33.14, 33.13, 33.4, 32.98, 32.7]}
How about rename
:
import re
df.rename(columns=lambda x: re.sub('\s+', ' ', x.strip()), inplace=True)
If you don't want to keep any of the spaces, you can just replace ' '
with ''
.