pythonpandaslambdavalueerrorleap-year

Leap Year leading to ValueError: day is out of range for month


I have a dataframe of people and I am trying to express their age with decimals in the dataframe relative to a specific date that I am getting from the name of the CSV. This code works until I get to a leap year in my dataset.

So for 2016-people.csv I have the following:

name,dob
John,1989-12-14
Luc,1985-01-10
Picard,1987-10-13
Patrick,1986-02-19
Swayze,1987-06-10

My code is as follows:

# get csv from directory
df = pd.read_csv(csv, index_col=0, parse_dates=["dob"])

# form date string
date_string = re.split(r'\s|-', csv)[0] + '0915'

d_date = pd.to_datetime(int(date_string),format='%Y%m%d', errors='coerce')

# adding age column
df['age'] = df['dob'].apply(
    lambda x: (d_date.year - x.year - 
    ((d_date.month, d_date.day) < (x.month, x.day))) 
    )   

    # calculating fraction of year passed since last birthday
df['fractional_age'] = df['dob'].apply(
    lambda x: (d_date - pd.to_datetime(x.replace(year=d_date.year))).days / 365.25 
    if (x.month, x.day) <= (d_date.month, d_date.day) 
    else (d_date - pd.to_datetime(x.replace(year=d_date.year - 1))).days / 365.25 
)

df['age'] = df['age'] + df['fractional_age']

df = df.drop(columns=['fractional_age'])

Which works. I get the following:

name,dob,age
John,1989-12-14,26.7556468172484
Luc,1985-01-10,31.6817248459958
Picard,1987-10-13,28.9253935660506
Patrick,1986-02-19,30.5722108145106
Swayze,1987-06-10,29.2655715263518

However, if I pass a CSV into this function that contains a dob on February 29th of a leap year, it throws the following value error:

File "...\addage.py", line 22, in add_age
    df['fractional_age'] = df['dob'].apply(
                           ^^^^^^^^^^^^^^^^
  File "...\AppData\Roaming\Python\Python311\site-packages\pandas\core\series.py", line 4630, in apply
    return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\AppData\Roaming\Python\Python311\site-packages\pandas\core\apply.py", line 1025, in apply
    return self.apply_standard()
           ^^^^^^^^^^^^^^^^^^^^^
  File "...\AppData\Roaming\Python\Python311\site-packages\pandas\core\apply.py", line 1076, in apply_standard
    mapped = lib.map_infer(
             ^^^^^^^^^^^^^^
  File "pandas\_libs\lib.pyx", line 2834, in pandas._libs.lib.map_infer
  File "...\addage.py", line 23, in <lambda>
    lambda x: (d_date - pd.to_datetime(x.replace(year=d_date.year))).days / 365.25
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pandas\_libs\tslibs\timestamps.pyx", line 2279, in pandas._libs.tslibs.timestamps.Timestamp.replace
  File "pandas\_libs\tslibs\timestamps.pyx", line 155, in pandas._libs.tslibs.timestamps.create_timestamp_from_ts
ValueError: day is out of range for month

How do I resolve this?


Solution

  • I found a solution for this, which works enough for me. I instead converted the dob to seconds and then convert it back later on.

    df['age'] = (d_date - df['dob']).dt.total_seconds() / (60*60*24*365.25)