I have this csv file "rfm_data.csv":
CustomerID PurchaseDate TransactionAmount ProductInformation
8814 11-04-23 943.31 Product C
2188 11-04-23 463.70 Product A
4608 11-04-23 80.28 Product A
2559 11-04-23 221.29 Product A
I read and transform data with this code:
data = pd.read_csv("rfm_data.csv")
data['PurchaseDate'] = pd.to_datetime(data['PurchaseDate'], format='%d-%m-%y')
data['Recency'] = (datetime.now().date() - data['PurchaseDate'].dt.date).dt.days
When I print (data) I get this error message:
AttributeError: Can only use .dt accessor with datetimelike values. Did you mean: 'at'?
If I delete the dt.day from the last line of code I got this result:
CustomerID PurchaseDate TransactionAmount ProductInformation Recency
8814 2023-04-11 943.31 Product C 140 days, 0:00:00
2188 2023-04-11 463.70 Product A 140 days, 0:00:00
4608 2023-04-11 80.28 Product A 140 days, 0:00:00
2559 2023-04-11 221.29 Product A 140 days, 0:00:00
But what I want in [Recency] is only the number of days to make further calculations.
Your problem lies in calling .dt.date
, which returns a vanilla Python date object column - that has no dt
accessor. Since your input only has dates, normalizing to the date is not needed. If you need to do it anyways (other use case maybe), use .dt.floor("d")
.
EX:
from io import StringIO
import pandas as pd
s = """CustomerID PurchaseDate TransactionAmount ProductInformation
8814 11-04-23 943.31 Product-C
2188 11-04-23 463.70 Product-A
4608 11-04-23 80.28 Product-A
2559 11-04-23 221.29 Product-A"""
data = pd.read_csv(StringIO(s), sep=" ")
data['PurchaseDate'] = pd.to_datetime(data['PurchaseDate'], format='%d-%m-%y')
data['Recency'] = (pd.Timestamp("now").floor("d") - data['PurchaseDate']).dt.days
print(data)
CustomerID PurchaseDate TransactionAmount ProductInformation Recency
0 8814 2023-04-11 943.31 Product-C 140
1 2188 2023-04-11 463.70 Product-A 140
2 4608 2023-04-11 80.28 Product-A 140
3 2559 2023-04-11 221.29 Product-A 140
Note that you can also use pd.Timestamp("now").floor("d")
to get today's date, which makes the code a bit more clean since you use pandas exclusively.