I have the following pandas dataframe
df = pd.DataFrame([[99, 88, 88, 66, 66, 99, 66, 77, 77, 77, 66, 66, 99, 99], list('DAABBBBABCBDDD'), ['***','**','****','*','***','*','**','***','*','*','****','**','**','****']]).T
df.columns = ['col1','col2','col3']
Assume that col1 are companies and col2 are products types. I am looking for the company with the largest number of different product types.
So I am looking for which unique value from col1 has the largest number of unique values in col2
I have tried the following:
df.groupby(['col1'])['col2'].nunique()
which returns:
col1
66 2
77 3
88 1
99 2
Now I would like to get the value from col1 with the highest value in col2. Which is:
77 3
I have tried
df.groupby(['col2'])['col1'].nunique().max()
However I only receive the max of unique values in col2
3
Instead, I would like to know both the max value from col2 and to which value in col1 this belongs. I.e.
77 3
Thank you for your help!
I would like to know both the max value from col2 and to which value in col1 this belongs.
With your result, call both:
result = df.groupby(['col1'])['col2'].nunique()
result.idxmax() # 77
result.max() # 3
You could also convert it to a DataFrame before calling .loc[lambda d: d.idxmax()]
but I don't know why you would want to do that.