pythonnumpygroup-bygraphlabsframe

Group by in SFrame without installing graphlab


How to use groupby operation in SFrame, without installing graphlab.

I would love to do some aggregation, but in all examples in the internet I have seen aggregation function comes from Graphlab.

Like:

import graphlab.aggregate as agg

user_rating_stats = sf.groupby(key_columns='user_id',
                          operations={
                                'mean_rating': agg.MEAN('rating'),
                                'std_rating': agg.STD('rating')
                            })

How can I use, say, numpy.mean and not agg.MEAN in the above example?


Solution

  • The sframe package contains the same aggregation module as the graphlab package, so you shouldn't need to resort to numpy.

    import sframe
    import sframe.aggregate as agg
    
    sf = sframe.SFrame({'user_id': [1, 1, 2],
                        'rating': [3.3, 3.6, 4.1]})
    grp = sf.groupby('user_id', {'mean_rating': agg.MEAN('rating'),
                                 'std_rating': agg.STD('rating')})
    print(grp)
    
    +---------+---------------------+-------------+
    | user_id |      std_rating     | mean_rating |
    +---------+---------------------+-------------+
    |    2    |         0.0         |     4.1     |
    |    1    | 0.15000000000000024 |     3.45    |
    +---------+---------------------+-------------+
    [2 rows x 3 columns]