After performing some operation on a dataset I ended up having a data frame like this:
df = pd.DataFrame([1, 2, 3, 4, 5],
index = pd.MultiIndex.from_arrays(np.array([[0, 0, 1, 1, 1],
[1448, 1448, 1448, 1448, 1448],
[0, 0, 1, 1, 1],
[1448, 1448, 1448, 1448, 1448],
['22', '11', '14', '15', '19']]),
names = ['meas_id', 'hec_id', 'meas_id', 'hec_id', 'lab_id']))
print(df)
0
meas_id hec_id meas_id hec_id lab_id
0 1448 0 1448 22 1
11 2
1 1448 1 1448 14 3
15 4
19 5
As you can notice, meas_id
and hec_id
columns of MultiIndex
are duplicated. I could use reset_index([3, 4], drop=True)
to drop them, but this resulting data frame is actually a part of a list of data frames, where some of them have those index columns duplicated and some do not. Therefore my question is: how can I drop duplicated index columns from MultiIndex?
Please note that this question refer to duplicates in columns, not in row.
For a solution that generalises in dropping duplicate index names you can do:
import numpy as np
ixs = np.flatnonzero(pd.Index(df.index.names).duplicated()).tolist()
df.reset_index(ixs, drop=True)
0
meas_id hec_id lab_id
0 1448 22 1
11 2
1 1448 14 3
15 4
19 5