I have this dataframe that is read from excel file :
import pandas as pd
import numpy as np
columns = pd.MultiIndex.from_product([['A', 'B'], ['x', 'y', 'z']])
df = pd.DataFrame(np.arange(1, len(columns)*5+1).reshape(-1, len(columns)), index=range(5), columns=columns)
print(df)
A B
x y z x y z
0 1 2 3 4 5 6
1 7 8 9 10 11 12
2 13 14 15 16 17 18
3 19 20 21 22 23 24
4 25 26 27 28 29 30
I need to make transformation to have unique columns names in the bottom level of columns.
I made the new columns and use set_levels
but the result is weird. It takes only the left part of new_columns
.
And sometimes in other datasets set_levels
has stricly no effect.
https://pandas.pydata.org/docs/reference/api/pandas.MultiIndex.set_levels.html
new_columns = ['x1', 'y1', 'z1'] + ['x2', 'y2', 'z2']
df.columns = df.columns.set_levels(new_columns, level=1)
print(df)
A B
x1 y1 z1 x1 y1 z1
0 1 2 3 4 5 6
1 7 8 9 10 11 12
2 13 14 15 16 17 18
3 19 20 21 22 23 24
4 25 26 27 28 29 30
Am I missing something guys ?
set_levels
is not doing what you think it is.
Indices are categorical values, they are encoded as integers. You can't pass a list of new values.
In fact, if you check the codes of your indices:
df.columns.codes
FrozenList([[0, 0, 0, 1, 1, 1],
[0, 1, 2, 0, 1, 2]])
You will see that 0, 1, 2 in the level 1 are duplicated, thus passing ['x1', 'y1', 'z1', 'x2', 'y2', 'z2']
, only the first 3 values are used.
You must craft a new MultiIndex:
data = list(zip(*df.columns.to_list()))
# or, shortcut
# data = list(zip(*df))
data[1] = new_columns
df.columns = pd.MultiIndex.from_arrays(data)
Output:
A B
x1 y1 z1 x2 y2 z2
0 1 2 3 4 5 6
1 7 8 9 10 11 12
2 13 14 15 16 17 18
3 19 20 21 22 23 24
4 25 26 27 28 29 30