pythonpandasgroup-bycountstatistics

How to find the most frequent value for each id


Let's say, I have number A and they call several people B

A   B

123 987
123 987
123 124
435 567
435 789
653 876
653 876
999 654
999 654
999 654
999 123

I want to find to whom the person in A has called maximum times and also the number of times.

OUTPUT:

A   B           Count
123 987         2
435 567 or 789  1
653 876         2
999 654         3

How one can think of it is,

A      B
123    987 2
       124 1
435    567 1
       789 1
653    876 2
999    654 3
       123 1

Can somebody help me out on how to do this?


Solution

  • We can get the desired output in 2 steps:

    1. Count values for each A (value_counts with 2 columns)
    2. Filter the rows with the max value for each A (boolean indexing)
    # count the unique values in rows
    # `sort_index` is optional
    df.value_counts(['A','B']).sort_index()
    
    A    B  
    123  124    1
         987    2
    435  567    1
         789    1
    653  876    2
    999  123    1
         654    3
    dtype: int64
    

    To get the highest values for each unique A:

    v = df.value_counts(['A','B'])
    agg_df = v[~v.reset_index(level=0).duplicated('A').values]
    

    You can also replace the second part with groupby.agg that gets the max of the count and its corresponding B value. Then the whole thing can be written in one line:

    agg_df = (
        df.value_counts(['A','B'])
        .reset_index()
        .groupby('A').agg({'B': 'first', 'count': 'max'})
    )
    
    A    B  
    999  654    3
    123  987    2
    653  876    2
    435  567    1
    dtype: int64