pythonpandasfilterfeature-extractiongroup

How to select user_id rows pandas


How can I calculate the first visited date and the last visited date before an order was placed by the user?

USER ID TYPE    DATE
1   Visited September 14, 2020
1   Visited October 4, 2020
1   Visited October 24, 2020
1   Ordered November 1, 2020
2   Visited September 14, 2020
2   Visited October 1, 2020
3   Visited September 1, 2020
3   Visited October 4, 2020
3   Visited October 4, 2020
3   Visited October 19, 2020
3   Ordered January 1, 2021
3   Visited February 11, 2021
3   Visited February 24, 2021
3   Visited March 1, 2021
3   Ordered April 21, 2021

Expected Output:

USER ID Ordered MIN DATE    MAX DATE
1   1   September 14, 2020  October 24, 2020
2   0   September 14, 2020  NAT
3   1   September 1, 2020   October 19, 2020
3   2   February 11, 2021   March 1, 2021

Solution

  • Try:

    df['DATE'] = pd.to_datetime(df['DATE'])
    
    df_out = df.assign(grp=(df['TYPE'] == 'Ordered')[::-1].cumsum())\
               .set_index(['USER ID', 'grp', 'TYPE'], append=True)['DATE']\
               .unstack('TYPE')\
               .groupby(['USER ID', 'grp'], sort=False)\
               .agg(Ordered=('Ordered','count'), 
                    MIN_DATE=('Visited','first'), 
                    MAX_DATE=('Visited','last'))\
               .reset_index('grp', drop=True)\
               .reset_index()
    
    df_out['MAX_DATE'] = df_out['MAX_DATE'].mask(df_out['Ordered'] == 0)
    df_out['Ordered'] = df_out['Ordered'].groupby(df_out['USER ID']).cumsum()
    
    df_out['MIN_DATE'] = df_out['MIN_DATE'].dt.strftime('%B %d, %Y')
    df_out['MAX_DATE'] = df_out['MAX_DATE'].dt.strftime('%B %d, %Y')
    

    Output:

       USER ID  Ordered            MIN_DATE          MAX_DATE
    0        1        1  September 14, 2020  October 24, 2020
    1        2        0  September 14, 2020               NaN
    2        3        1  September 01, 2020  October 19, 2020
    3        3        2   February 11, 2021    March 01, 2021