I have df like this:
id ColumnA ColumnB ColumnC
1 Audi_BMW_VW BMW_Audi VW
2 VW Audi Audi_BMW_VW
I want to explode the columns based on explode when _ appear. For example for "Column A" like this
df['Column A'].str.split('_')).explode('Column A')
but when i use similar query for column B then it repeats the values of column A, but i really want that only ID should duplicate. The desired output would be something like this:
id ColumnA ColumnB ColumnC
1 Audi BMW VW
1 BMW Audi
1 VW
2 VW Audi Audi
2 BMW
2 VW
Lots of reshaping. The key point is to stack then call Series.str.split
on a single Series with the id
as the Index.
(df.set_index('id') # keep 'id' bound to cells in the row
.stack() # to a single Series
.str.split('_', expand=True) # split into separate cells on '_'
.unstack(-1).stack(0) # original column labels back to columns
.reset_index(-1, drop=True) # remove split number label
)
ColumnA ColumnB ColumnC
id
1 Audi BMW VW
1 BMW Audi None
1 VW None None
2 VW Audi Audi
2 None None BMW
2 None None VW