The dataframe is like this;
Cluster | Genre 1 | Genre 2 | Genre 3 | Genre 4 | Genre 5 |
---|---|---|---|---|---|
1 | 10 | 31 | 5 | 3 | 23 |
2 | 53 | 12 | 6 | 9 | 7 |
3 | 44 | 73 | 1 | 9 | 13 |
As output, I want something like this, so I can see what genres are the dominant ones in each cluster.
Cluster | 1st | 2nd | 3rd |
---|---|---|---|
1 | Genre 2 | Genre 5 | Genre 1 |
2 | Genre 1 | Genre 2 | Genre 4 |
3 | Genre 2 | Genre 1 | Genre 5 |
I want to show the top 3 "genres" from each cluster in a graph, I have no idea how I would do this for a row instead of columns. Is anyone here familiar with this?
You can use numpy.argsort
on df.values
and axis=1
and select three largest and use df.columns for getting column name:
import pandas as pd
import numpy as np
df = df.set_index('Cluster')
res = pd.DataFrame(df.columns[np.argsort(-1*df.values,axis=1)[:, :3]],
columns=['1st', '2nd',' 3rd'])
print(res)
Output:
1st 2nd 3rd
0 Genre 2 Genre 5 Genre 1
1 Genre 1 Genre 2 Genre 4
2 Genre 2 Genre 1 Genre 5