Say I have a dataframe
df = pd.DataFrame({'colA' : ['ABC', 'JKL', 'STU', '123'],
'colB' : ['DEF', 'MNO', 'VWX', '456'],
'colC' : ['GHI', 'PQR', 'YZ', '789'],}, index = [0,0,1,1])
colA colB colC
0 ABC DEF GHI
0 JKL MNO PQR
1 STU VWX YZ
1 123 456 789
Its guranteed that every pair will have the same index, so we would like the end result to be :
colA colB colC
0 ABC_JKL_0 DEF_MNO_0 GHI_PQR_0
1 STU_123_1 VWX_456_1 YZ_789_1
where the suffix _number
is the index of that group.
I tried doing this by iterating through rows but that's taking a lot of time.
I was thinking of something like .groupby(level=0)
but can't figure out the next aggregation apply
part
IIUC, you can try something like this using .agg and a lambda function or you can add it into the dataframe after the groupby:
df_out=df.groupby(level=0).agg(lambda x: '_'.join(x)+'_'+str(x.index[0]))
Output:
colA colB colC
0 ABC_JKL_0 DEF_MNO_0 GHI_pQR_0
1 STU_123_1 VWX_456_1 YZ_789_1
Or
df_out=df.groupby(level=0).agg('_'.join)
df_out = df_out.add('_'+df_out.index.to_series().astype(str), axis=0)
print(df_out)