I have a Pandas dataframe that records the performance of students in exams in different classes and it looks like:
Class_ID Class_size Student_Number IQ Hours Score Place
1 3 1 101 10 98 1
1 3 2 99 19 80 3
1 3 3 130 3 95 2
2 5 1 93 5 50 5
2 5 2 103 9 88 3
2 5 3 112 12 99 2
2 5 4 200 10 100 1
2 5 5 90 19 78 4
3 2 1 100 12 84 2
3 2 2 102 13 88 1
and I would like to convert the above long format into wide format, using Student_Number
as columns
and Class_ID, Class_size
as index
, with the last column Top
being the student who comes first in that class, so the desire outcome looks like:
Class_ID Class_size IQ_1 IQ_2 IQ_3 IQ_4 IQ_5 Hours_1 Hours_2 Hours_3 Hours_4 Hours_5 Score_1 Score_2 Score_3 Score_4 Score_5 Top
1 1 101 99 130 NaN NaN 10 19 3 NaN NaN 98 80 95 NaN NaN 1
2 5 93 103 112 200 90 5 9 12 10 19 50 88 99 100 78 4
3 2 100 102 NaN NaN NaN 12 13 NaN NaN NaN 84 88 NaN NaN NaN 2
And here is what I have tried:
out = df.pivot_table(index=['Class_ID', 'Class_size'],
columns='Student_Number',
values=['IQ', 'Hours', 'Score'])
out.columns = [f'{x}_{y}' for x,y in out.columns]
out_dummy = out.reset_index()
df_wide = out_dummy
However, I have no idea how to create the last column Top
. And also, the above code seems to be rather slow, and since my original dataframe is quite huge (~300,000rows), I would like to ask is there any quicker way to do that too. Thank you so much in advance.
A simple method using apply:
out["Top"] = out.apply(lambda row: 1 + np.argmax(row["Score_1":"Score_5"]) , axis=1)