[SOLVED] How can I get the first three max values from each row in a Pandas dataframe?

How can I get the first three max values from each row in a Pandas dataframe?

The dataframe is like this;

Cluster	Genre 1	Genre 2	Genre 3	Genre 4	Genre 5
1	10	31	5	3	23
2	53	12	6	9	7
3	44	73	1	9	13

As output, I want something like this, so I can see what genres are the dominant ones in each cluster.

Cluster	1st	2nd	3rd
1	Genre 2	Genre 5	Genre 1
2	Genre 1	Genre 2	Genre 4
3	Genre 2	Genre 1	Genre 5

I want to show the top 3 "genres" from each cluster in a graph, I have no idea how I would do this for a row instead of columns. Is anyone here familiar with this?

Solution

You can use numpy.argsort on df.values and axis=1 and select three largest and use df.columns for getting column name:

import pandas as pd
import numpy as np
df = df.set_index('Cluster')
res = pd.DataFrame(df.columns[np.argsort(-1*df.values,axis=1)[:, :3]], 
                   columns=['1st', '2nd',' 3rd'])
print(res)

Output:

        1st       2nd       3rd
0   Genre 2   Genre 5   Genre 1
1   Genre 1   Genre 2   Genre 4
2   Genre 2   Genre 1   Genre 5