pythonpandas

How to explode the column value without duplicating the other columns values in panda dataframe?


I have df like this:

id       ColumnA           ColumnB           ColumnC
1        Audi_BMW_VW       BMW_Audi             VW
2          VW                 Audi            Audi_BMW_VW

I want to explode the columns based on explode when _ appear. For example for "Column A" like this

df['Column A'].str.split('_')).explode('Column A')

but when i use similar query for column B then it repeats the values of column A, but i really want that only ID should duplicate. The desired output would be something like this:

id       ColumnA           ColumnB           ColumnC
1        Audi                BMW             VW
1         BMW                Audi
1         VW                  
2         VW                 Audi            Audi
2                                            BMW
2                                            VW

Solution

  • Lots of reshaping. The key point is to stack then call Series.str.split on a single Series with the id as the Index.

    (df.set_index('id')             # keep 'id' bound to cells in the row 
       .stack()                     # to a single Series
       .str.split('_', expand=True) # split into separate cells on '_'
       .unstack(-1).stack(0)        # original column labels back to columns
       .reset_index(-1, drop=True)  # remove split number label
    )
    

       ColumnA ColumnB ColumnC
    id                        
    1     Audi     BMW      VW
    1      BMW    Audi    None
    1       VW    None    None
    2       VW    Audi    Audi
    2     None    None     BMW
    2     None    None      VW