Suppose I have a DataFrame with the following format of strings separated by commas:
Index | ColumnName |
---|---|
0 | apple,peach,orange,pear, |
1 | orange, pear,apple |
2 | pear |
3 | peach,apple |
4 | orange |
The actual number of rows will be greater than 10,000.
I want to expand the DataFrame and sort the DataFrame by row 0.
My expected output is below, where None is of type NoneType:
Index | 0 | 1 | 2 | 3 |
---|---|---|---|---|
0 | apple | peach | orange | pear |
1 | apple | None | orange | pear |
2 | None | None | None | pear |
3 | apple | peach | None | None |
4 | None | None | orange | None |
I have expanded the data using the following code:
df = df['ColumnName'].str.split(',', expand=True) # Expand initial DataFrame
However, I am unable to sort or reorder the data as desired despite trying various combinations of df.sort_values()
.
Here is another way:
s = df['columnName'].str.strip(',').str.split(', ?').explode()
s.set_axis(pd.MultiIndex.from_frame(s.groupby(s).ngroup().reset_index())).unstack()
Output:
0 0 1 2 3
index
0 apple orange peach pear
1 apple orange NaN pear
2 NaN NaN NaN pear
3 apple NaN peach NaN
4 NaN orange NaN NaN