pythonpandasdataframeoffsetpython-holidays

Key Error resulting from apply and lambda function on DataFrame column


I generated a new column displaying the 7th business day of the year and month from this df:

      YearOfSRC MonthNumberOfSRC    
0       2022             3    
1       2022             4          
2       2022             5             
3       2022             6            
4       2021             4   
... ... ... ...
20528   2022             1             
20529   2022             2             
20530   2022             3            
20531   2022             4             
20532   2022             5             

With this code:

df['PredictionDate'] = (pd
 .to_datetime(df[['YearOfSRC', 'MonthNumberOfSRC']]
                .set_axis(['year' ,'month'], axis=1)
                .assign(day=1)
              )
 .sub(pd.offsets.BusinessDay(1))
 .add(pd.offsets.BusinessDay(7))
)

To output this dataframe (df_final) with the new column, PredictionDate:

       YearOfSRC    MonthNumberOfSRC    PredictionDate
0       2022             3              2022-03-09
1       2022             4              2022-04-11
2       2022             5              2022-05-10
3       2022             6              2022-06-09
4       2021             4              2021-04-09
... ... ... ...
20528   2022             1              2022-01-11
20529   2022             2              2022-02-09
20530   2022             3              2022-03-09
20531   2022             4              2022-04-11
20532   2022             5              2022-05-10

(More details here)

However, I would like to make use of CustomBusinessDay and Python's Holiday package to modify the rows of PredictionDate where a holiday in the first week would push back the 7th business day by 1 business day. I know that CustomBusinessDay has a parameter for a holiday list so in a modular solution I would assign the list from the holiday library to that parameter. I know I could hard-code the added business day by increasing the day by 1 for all months where there is a holiday in the first week, but I would prefer a solution that is more dynamic. I have tried this instead of the above code but I get a KeyError:

df_final['PredictionDate'] = (pd
 .to_datetime(df_final[['YearOfSRC', 'MonthNumberOfSRC']]
                .set_axis(['year' ,'month'], axis=1)
                .assign(day=1)
              )
 .sub(df_final.apply(lambda x : pd.offsets.CustomBusinessDay(1, holidays = holidays.US(years = x['YearOfSRC']).keys())))
 .add(df_final.apply(lambda x: pd.offsets.CustomBusinessDay(7, holidays = holidays.US(years = x['YearOfSRC']).keys())))
)

KeyError: 'YearOfSRC'

I'm sure I am implementing pandas apply and lambda functions incorrectly here, but I don't know why the error would be a key error when that's clearly a column in df_final.


Solution

  • Per my comment above, try this:

    df_final['PredictionDate'] = (pd
     .to_datetime(df_final[['YearOfSRC', 'MonthNumberOfSRC']]
                    .set_axis(['year' ,'month'], axis=1)
                    .assign(day=1)
                  )
     .sub(df_final.apply(lambda x : pd.offsets.CustomBusinessDay(1, holidays = holidays.US(year = x['YearOfSRC']).items()), axis=1))
     .add(df_final.apply(lambda x: pd.offsets.CustomBusinessDay(7, holidays = holidays.US(year = x['YearOfSRC']).items()), axis=1))
    )