rubydataframedaru

How to get distinct count in aggregate


I simply want to get distinct_count aggregation.

I have this code:

data_frame = data_frame.group_by(:job_id)
                       .aggregate(job_id: :max, bid_id: :count)

I want something like this:

data_frame = data_frame.group_by(:job_id)
                       .aggregate(job_id: :max, bid_id: :distinct_count)

I know there is no statistical method like that implemented yet, is there any other way?


Solution

  • I found one way to do this:

    data_frame = data_frame.group_by(:job_id)
                           .aggregate(job_id: :max,
                                      bid_id: lambda{ |x| x.uniq.size })
    

    or maybe better yet:

    data_frame = data_frame.group_by(:job_id)
                           .aggregate(job_id: :max,
                                      bid_id: ->(x) { x.uniq.size })
    

    I am not sure if it is the right way, but it seems to work.

    This pandas solution helped me.