pythonpandasgroup-byunique-values

Pandas Dataframe: Find unique value from one column which has the largest number of unique values in another column


I have the following pandas dataframe

df = pd.DataFrame([[99, 88, 88, 66, 66, 99, 66, 77, 77, 77, 66, 66, 99, 99], list('DAABBBBABCBDDD'), ['***','**','****','*','***','*','**','***','*','*','****','**','**','****']]).T
df.columns = ['col1','col2','col3']

Assume that col1 are companies and col2 are products types. I am looking for the company with the largest number of different product types.

So I am looking for which unique value from col1 has the largest number of unique values in col2

I have tried the following:

df.groupby(['col1'])['col2'].nunique()

which returns:

col1
66    2
77    3
88    1
99    2

Now I would like to get the value from col1 with the highest value in col2. Which is:

77    3

I have tried

df.groupby(['col2'])['col1'].nunique().max()

However I only receive the max of unique values in col2

3

Instead, I would like to know both the max value from col2 and to which value in col1 this belongs. I.e.

 77    3

Thank you for your help!


Solution

  • I would like to know both the max value from col2 and to which value in col1 this belongs.

    With your result, call both:

    result = df.groupby(['col1'])['col2'].nunique()
    result.idxmax()  # 77
    result.max()  # 3
    

    You could also convert it to a DataFrame before calling .loc[lambda d: d.idxmax()] but I don't know why you would want to do that.