pandasgroup-byfrequency

fastest way to get value frequency stored in dictionary format in groupby pandas


In order to calculate frequency of each value by id, we can do something using value_counts and groupby.

>>> df = pd.DataFrame({"id":[1,1,1,2,2,2], "col":['a','a','b','a','b','b']})
>>> df
   id col
0   1   a
1   1   a
2   1   b
3   2   a
4   2   b
5   2   b
>>> df.groupby('id')['col'].value_counts()
id  col
1   a      2
    b      1
2   b      2
    a      1

But I would like to get results stored in dictionary format, not Series. So how am I able to achieve that and also the speed is fast if we have a large dataset? The ideal format is:

id
1    {'a': 2, 'b': 1}
2    {'a': 1, 'b': 2}

Solution

  • You can unstack the groupby result to get a dict-of-dicts:

    df.groupby('id')['col'].value_counts().unstack().to_dict(orient='index')
    # {1: {'a': 2, 'b': 1}, 2: {'a': 1, 'b': 2}}
    

    If you want a Series of dicts, use agg instead of to_dict:

    df.groupby('id')['col'].value_counts().unstack().agg(pd.Series.to_dict)
    
    col
    a    {1: 2, 2: 1}
    b    {1: 1, 2: 2}
    dtype: object
    

    I don't recommend storing data in this format, objects are generally more troublesome to work with.


    If unstacking generates NaNs, try an alternative with GroupBy.agg:

    df.groupby('id')['col'].agg(lambda x: x.value_counts().to_dict())
    
    id
    1    {'a': 2, 'b': 1}
    2    {'b': 2, 'a': 1}
    Name: col, dtype: object