I have a dataframe as below:
df = pd.DataFrame({'Id': ['abs1', 'abs2', 'abs3', 'plo2', '201805', '201806', '202011', 'pctx1'],
'Date': ['2021-06-15', '2021-06-13', '2021-06-07', '2021-05-30',
'2021-05-12', '2021-04-28', '2021-04-15', '2021-02-01']})
I wish to bin the Date column into several groups in a new column, called Date_Bin, the rule is: from today's date, if the value in the Date is less than 7 days, then the value in the new column will be 'last 7 days', if the value is less than 14 days and more than 7 days from today, the value is '7 to 14 days', if the value is less than 30 days and more than 14 days, then the value is '14 to 30 days', same logic for 30 to 60 days, 60 to 90 days, and more than 90 days. The ideal output is like this:
Id Date Date_Bin
0 abs1 2021-06-15 last 7 days
1 abs2 2021-06-13 last 7 days
2 abs3 2021-06-07 7 to 14 days
3 plo2 2021-05-30 14 to 30 days
4 201805 2021-05-10 30 to 60 days
5 201806 2021-04-28 30 to 60 days
6 202011 2021-04-15 60 to 90 days
7 pctx1 2021-02-01 more than 90 days
As you can see the output, those are the only groups/bins I need for the data. I tried a couple of ways and did not work, so much appreciate it if anyone can help please.
Convert your dates with to_datetime
then subtract from today's normalized
date (so that we remove the time part) and get the number of days. Then use pd.cut
to group them appropriately.
Anything in the future gets labeled with NaN
.
import pandas as pd
import numpy as np
df['Date'] = pd.to_datetime(df['Date'])
s = (pd.to_datetime('today').normalize() - df['Date']).dt.days
df['Date_Bin'] = pd.cut(s, [0, 7, 14, 30, 60, 90, np.inf],
labels=['last 7 days', '7 to 14 days', '14 to 30 days',
'30 to 60 days', '60 to 90 days', 'more than 90 days'],
include_lowest=True)
print(df)
Id Date Date_Bin
0 abs1 2021-06-15 last 7 days
1 abs2 2021-06-13 last 7 days
2 abs3 2021-06-07 7 to 14 days
3 plo2 2021-05-30 14 to 30 days
4 201805 2021-05-12 30 to 60 days
5 201806 2021-04-28 30 to 60 days
6 202011 2021-04-15 60 to 90 days
7 pctx1 2021-02-01 more than 90 days
For future reproducibility, at the time of writing:
pd.to_datetime('today').normalize()
#Timestamp('2021-06-15 00:00:00')