I simply want to get distinct_count aggregation.
I have this code:
data_frame = data_frame.group_by(:job_id)
.aggregate(job_id: :max, bid_id: :count)
I want something like this:
data_frame = data_frame.group_by(:job_id)
.aggregate(job_id: :max, bid_id: :distinct_count)
I know there is no statistical method like that implemented yet, is there any other way?
I found one way to do this:
data_frame = data_frame.group_by(:job_id)
.aggregate(job_id: :max,
bid_id: lambda{ |x| x.uniq.size })
or maybe better yet:
data_frame = data_frame.group_by(:job_id)
.aggregate(job_id: :max,
bid_id: ->(x) { x.uniq.size })
I am not sure if it is the right way, but it seems to work.
This pandas solution helped me.