I've got a dataframe of the form:
name 0 1 2
0 A 4 2 1
1 B 2 3 4
2 C 1 3 2
This is the result of grouping and summarizing data earlier in my real world data process.
What I would like to do now, is to expand (explode?) each row so that each element in that row fills a number of rows corresponding to the column title, except for name
, so that the dataframe ends up like this:
name 0 1 2
------------------
A 0 1 2
A 0 1 nan
A 0 nan nan
A 0 nan nan
B 0 1 2
B 0 1 2
B nan 1 2
B nan nan 2
C 0 1 2
C nan 1 2
C nan 1 nan
I've tried a bunch of stuff with df.iterrows()
and assigning new columns to an empty dataframe with columns with the lenghts of the max
of each row and filling up with nans
, but it ended up buggy and really messy. It would be great if any of you pandas experts could take a look. Thank you for any suggestions!
import pandas as pd
df = pd.DataFrame({'name': ['A','B','C'], 0:[4,2,1], 1:[2,3,3], 2:[1,4,2]})
You can repeat your rows to the maximum number per original row, then use numpy broadcasting to mask the non-needed values:
# get list of columns other than name
cols = list(df.columns.difference(['name'], sort=False))
# repeat the rows to the max of each group
out = df.loc[df.index.repeat(df[cols].max(axis=1))]
# only keep the values < to the number of repeats
out[cols] = np.where(out[cols].gt(out.groupby('name').cumcount(), axis=0),
np.repeat([cols], len(out), axis=0), np.nan
)
Output:
name 0 1 2
0 A 0.0 1.0 2.0
0 A 0.0 1.0 NaN
0 A 0.0 NaN NaN
0 A 0.0 NaN NaN
1 B 0.0 1.0 2.0
1 B 0.0 1.0 2.0
1 B NaN 1.0 2.0
1 B NaN NaN 2.0
2 C 0.0 1.0 2.0
2 C NaN 1.0 2.0
2 C NaN 1.0 NaN
Intermediates:
df.loc[df.index.repeat(df[cols].max(axis=1))]
name 0 1 2
0 A 4 2 1
0 A 4 2 1
0 A 4 2 1
0 A 4 2 1
1 B 2 3 4
1 B 2 3 4
1 B 2 3 4
1 B 2 3 4
2 C 1 3 2
2 C 1 3 2
2 C 1 3 2
out[cols].gt(out.groupby('name').cumcount(), axis=0)
0 1 2
0 True True True
0 True True False
0 True False False
0 True False False
1 True True True
1 True True True
1 False True True
1 False False True
2 True True True
2 False True True
2 False True False