Hi I would like to take the top 3 numbers for each person and and append the reason_comments into them. If there is a tie, I would like to just take the first one.
May i know how can i do this in python please?
Table 1:
id VarA VarB VarC VarD VarE
1 5 4 3 2 1
2 4 6 21 5 5
3 3 8 6 9 0
4 7 8 23 44 0
Table 2:
reason_code reason_comment
VarA A is high
VarB B is high
VarC C is high
VarD D is high
VarE E is high
Results:
id reason 1 reason 2 reason 3
1 A is high B is high C is high
2 C is high B is high D is high
3 D is high B is high C is high
4 D is high C is high B is high
There are possible ties, so is necessary remove them. So you can reshape DataFrame by DataFrame.melt
, sorting by DataFrame.sort_values
and remove duplicated by DataFrame.drop_duplicates
.
df1 = (df.melt('id')
.sort_values(['id','value'], ascending=[True, False])
.drop_duplicates(['id','value']))
Then for filter top3 is used GroupBy.cumcount
for possible reuse it for new column names in DataFrame.pivot
:
df1['g'] = df1.groupby('id').cumcount().add(1)
df1 = df1[df1['g'].le(3)]
Also use Series.map
for data by another DataFrame:
s = df2.set_index('reason_code')['reason_comment']
df1['variable'] = df1['variable'].map(s)
df1 = df1.pivot('id','g','variable').add_prefix('reason')
print (df)
g reason1 reason2 reason3
id
1 A is high B is high C is high
2 C is high B is high D is high
3 D is high B is high C is high
4 D is high C is high B is high
For convert id
to column and remove g
use:
df1 = df1.reset_index().rename_axis(None, axis=1)