How can I calculate the first visited date and the last visited date before an order was placed by the user?
USER ID TYPE DATE
1 Visited September 14, 2020
1 Visited October 4, 2020
1 Visited October 24, 2020
1 Ordered November 1, 2020
2 Visited September 14, 2020
2 Visited October 1, 2020
3 Visited September 1, 2020
3 Visited October 4, 2020
3 Visited October 4, 2020
3 Visited October 19, 2020
3 Ordered January 1, 2021
3 Visited February 11, 2021
3 Visited February 24, 2021
3 Visited March 1, 2021
3 Ordered April 21, 2021
Expected Output:
USER ID Ordered MIN DATE MAX DATE
1 1 September 14, 2020 October 24, 2020
2 0 September 14, 2020 NAT
3 1 September 1, 2020 October 19, 2020
3 2 February 11, 2021 March 1, 2021
Try:
df['DATE'] = pd.to_datetime(df['DATE'])
df_out = df.assign(grp=(df['TYPE'] == 'Ordered')[::-1].cumsum())\
.set_index(['USER ID', 'grp', 'TYPE'], append=True)['DATE']\
.unstack('TYPE')\
.groupby(['USER ID', 'grp'], sort=False)\
.agg(Ordered=('Ordered','count'),
MIN_DATE=('Visited','first'),
MAX_DATE=('Visited','last'))\
.reset_index('grp', drop=True)\
.reset_index()
df_out['MAX_DATE'] = df_out['MAX_DATE'].mask(df_out['Ordered'] == 0)
df_out['Ordered'] = df_out['Ordered'].groupby(df_out['USER ID']).cumsum()
df_out['MIN_DATE'] = df_out['MIN_DATE'].dt.strftime('%B %d, %Y')
df_out['MAX_DATE'] = df_out['MAX_DATE'].dt.strftime('%B %d, %Y')
Output:
USER ID Ordered MIN_DATE MAX_DATE
0 1 1 September 14, 2020 October 24, 2020
1 2 0 September 14, 2020 NaN
2 3 1 September 01, 2020 October 19, 2020
3 3 2 February 11, 2021 March 01, 2021