pythonpandasmulti-index

How to perform string operation on an entire Pandas MultiIndex?


I have a pandas dataframe with a two-level column index. It's read in from a spreadsheet where the author used a lot of whitespace to accomplish things like alignment (for example, one column is called 'Tank #').

I've been able to remove the whitespace on the levels individually...

level0 = df.columns.levels[0].str.replace('\s', '', regex=True)
level1 = df.columns.levels[1].str.replace('\s', '', regex=True)
df.columns.set_levels([level0, level1], inplace=True)

...but I'm curious if there is a way to do it without having to change each individual level one at a time.

I tried

df.columns.set_levels(df.columns.str.replace('\s', '', regex=True))

but got AttributeError: Can only use .str accessor with Index, not MultiIndex.

Here is a small sample subset of the data.

  Run Info                                      Run Data              
     run # Tank                             # Step A\npH concentration
0     6931                                  5       5.29         33.14
1     6932                                  1       5.28         33.13
2     6933                                  2       5.32          33.4
3     6934                                  3       5.19         32.98
4     6935                                  4       5.28          32.7
>>> df.to_dict('list')
{('Run Info', 'run #'): [6931, 6932, 6933, 6934, 6935],
 ('Run Info', 'Tank                             #'): [5, 1, 2, 3, 4],
 ('Run Data', 'Step A\npH'): [5.29, 5.28, 5.32, 5.19, 5.28],
 ('Run Data', 'concentration'): [33.14, 33.13, 33.4, 32.98, 32.7]}

Solution

  • How about rename:

    import re
    
    df.rename(columns=lambda x: re.sub('\s+', ' ', x.strip()), inplace=True)
    

    If you don't want to keep any of the spaces, you can just replace ' ' with ''.