pythonpy-datatable

How to find unique values by group in datatable Frame


I have created a datatable frame as follows,

DT_EX = dt.Frame({'cid':[1,2,1,2,3,2,4,2,4,5],
                  'cust_life_cycle':['Lead','Active','Lead','Active','Inactive','Lead','Active','Lead','Inactive','Lead']})

Here I have three unique customer life cycles and each of these counts are found as

DT_EX[:, count(), by(f.cust_life_cycle)]

Along with it, I have five customer IDs and these counts are as

DT_EX[:, count(), by(f.cid)]

Now I would like to see how many of unique customer ID's existed per each of customer life cycle,

DT_EX[:, {'unique_cids':dt.unique(f.cid)}, by(f.cust_life_cycle)]

It should display as Lead customer has got 3 unique customer ID's such as (1,2,5), Active user has got 2 unique customer ID's (2,4) so on forth.

I couldn't get it as expected, Could you please let me know how to get it fixed?.

FYI: I have tried to reproduce the same on R data.table frame, its working.

DT_EX[, uniqueN(cid), by=cust_life_cycle]

Solution

  • There is now a nunique implementation :

    DT_EX[:, f.cid.nunique(), 'cust_life_cycle']
    
       | cust_life_cycle    cid
       | str32            int64
    -- + ---------------  -----
     0 | Active               2
     1 | Inactive             2
     2 | Lead                 3
    [3 rows x 2 columns]