pythonpandasmulti-index

Drop duplicated index names in MultiIndex dataframe


After performing some operation on a dataset I ended up having a data frame like this:

df = pd.DataFrame([1, 2, 3, 4, 5], 
              index = pd.MultiIndex.from_arrays(np.array([[0, 0, 1, 1, 1],
                                                          [1448, 1448, 1448, 1448, 1448],
                                                          [0, 0, 1, 1, 1],
                                                          [1448, 1448, 1448, 1448, 1448],
                                                          ['22', '11', '14', '15', '19']]), 
                                                names = ['meas_id', 'hec_id', 'meas_id', 'hec_id', 'lab_id']))

print(df)

                                      0
meas_id hec_id meas_id hec_id lab_id   
0       1448   0       1448   22      1
                              11      2
1       1448   1       1448   14      3
                              15      4
                              19      5

As you can notice, meas_id and hec_id columns of MultiIndex are duplicated. I could use reset_index([3, 4], drop=True) to drop them, but this resulting data frame is actually a part of a list of data frames, where some of them have those index columns duplicated and some do not. Therefore my question is: how can I drop duplicated index columns from MultiIndex?

Please note that this question refer to duplicates in columns, not in row.


Solution

  • For a solution that generalises in dropping duplicate index names you can do:

    import numpy as np
    
    ixs = np.flatnonzero(pd.Index(df.index.names).duplicated()).tolist()
    df.reset_index(ixs, drop=True)
    
                           0
    meas_id hec_id lab_id   
    0       1448   22      1
                   11      2
    1       1448   14      3
                   15      4
                   19      5