pythonpandasdatetimetimedelta

Python, Can't Transform Date Array to days


I have this csv file "rfm_data.csv":

CustomerID PurchaseDate  TransactionAmount ProductInformation
8814       11-04-23             943.31          Product C
2188       11-04-23             463.70          Product A
4608       11-04-23              80.28          Product A
2559       11-04-23             221.29          Product A

I read and transform data with this code:

    data = pd.read_csv("rfm_data.csv")
    data['PurchaseDate'] = pd.to_datetime(data['PurchaseDate'],  format='%d-%m-%y')
    data['Recency'] = (datetime.now().date() - data['PurchaseDate'].dt.date).dt.days

When I print (data) I get this error message:

AttributeError: Can only use .dt accessor with datetimelike values. Did you mean: 'at'?

If I delete the dt.day from the last line of code I got this result:

CustomerID PurchaseDate  TransactionAmount ProductInformation Recency
8814       2023-04-11             943.31          Product C   140 days, 0:00:00
2188       2023-04-11             463.70          Product A   140 days, 0:00:00
4608       2023-04-11              80.28          Product A   140 days, 0:00:00
2559       2023-04-11             221.29          Product A   140 days, 0:00:00

But what I want in [Recency] is only the number of days to make further calculations.


Solution

  • Your problem lies in calling .dt.date, which returns a vanilla Python date object column - that has no dt accessor. Since your input only has dates, normalizing to the date is not needed. If you need to do it anyways (other use case maybe), use .dt.floor("d").

    EX:

    from io import StringIO
    import pandas as pd
    
    s = """CustomerID PurchaseDate TransactionAmount ProductInformation
    8814 11-04-23 943.31 Product-C
    2188 11-04-23 463.70 Product-A
    4608 11-04-23 80.28 Product-A
    2559 11-04-23 221.29 Product-A"""
    
    data = pd.read_csv(StringIO(s), sep=" ")
    data['PurchaseDate'] = pd.to_datetime(data['PurchaseDate'],  format='%d-%m-%y')
    data['Recency'] = (pd.Timestamp("now").floor("d") - data['PurchaseDate']).dt.days
    
    print(data)
       CustomerID PurchaseDate  TransactionAmount ProductInformation  Recency
    0        8814   2023-04-11             943.31          Product-C      140
    1        2188   2023-04-11             463.70          Product-A      140
    2        4608   2023-04-11              80.28          Product-A      140
    3        2559   2023-04-11             221.29          Product-A      140
    

    Note that you can also use pd.Timestamp("now").floor("d") to get today's date, which makes the code a bit more clean since you use pandas exclusively.