pandas

Why the behaviour of set_levels is unpredictable?


I have this dataframe that is read from excel file :

import pandas as pd
import numpy as np

columns = pd.MultiIndex.from_product([['A', 'B'], ['x', 'y', 'z']])
df = pd.DataFrame(np.arange(1, len(columns)*5+1).reshape(-1, len(columns)), index=range(5), columns=columns)

print(df)

    A           B        
    x   y   z   x   y   z
0   1   2   3   4   5   6
1   7   8   9  10  11  12
2  13  14  15  16  17  18
3  19  20  21  22  23  24
4  25  26  27  28  29  30

I need to make transformation to have unique columns names in the bottom level of columns.

I made the new columns and use set_levels but the result is weird. It takes only the left part of new_columns. And sometimes in other datasets set_levels has stricly no effect. https://pandas.pydata.org/docs/reference/api/pandas.MultiIndex.set_levels.html

new_columns = ['x1', 'y1', 'z1'] + ['x2', 'y2', 'z2']

df.columns = df.columns.set_levels(new_columns, level=1)

print(df)

    A           B        
   x1  y1  z1  x1  y1  z1
0   1   2   3   4   5   6
1   7   8   9  10  11  12
2  13  14  15  16  17  18
3  19  20  21  22  23  24
4  25  26  27  28  29  30

Am I missing something guys ?


Solution

  • set_levels is not doing what you think it is.

    Indices are categorical values, they are encoded as integers. You can't pass a list of new values.

    In fact, if you check the codes of your indices:

    df.columns.codes
    
    FrozenList([[0, 0, 0, 1, 1, 1],
                [0, 1, 2, 0, 1, 2]])
    

    You will see that 0, 1, 2 in the level 1 are duplicated, thus passing ['x1', 'y1', 'z1', 'x2', 'y2', 'z2'], only the first 3 values are used.

    You must craft a new MultiIndex:

    data = list(zip(*df.columns.to_list()))
    # or, shortcut
    # data = list(zip(*df))
    data[1] = new_columns
    
    df.columns = pd.MultiIndex.from_arrays(data)
    

    Output:

        A           B        
       x1  y1  z1  x2  y2  z2
    0   1   2   3   4   5   6
    1   7   8   9  10  11  12
    2  13  14  15  16  17  18
    3  19  20  21  22  23  24
    4  25  26  27  28  29  30