pythonpandasdataframenumpypython-holidays

Python Identify US holidays in a timeseries dataframe


I have a dataframe with years of data. I want to detect, assign True/False or 1/0 if it is a Holiday.

My code:

df = pd.DataFrame(index=['2004-10-01', '2004-10-02', '2004-10-03', '2004-10-04', '2004-10-05',
       '2004-10-06', '2004-10-07', '2004-10-08', '2004-10-09', '2004-10-10',
       '2018-07-25', '2018-07-26', '2018-07-27', '2018-07-28', '2018-07-29',
       '2018-07-30', '2018-07-31', '2018-08-01', '2018-08-02', '2018-08-03'])

import holidays
# Detect US holidays
hldys = holidays.country_holidays('US')
# I have to apply this to each date in the dataframe index
df['isHoliday?'] = df.index.map(lambda x: hldy.get(x))

Present output:

            isHoliday?
2004-10-01  None
2004-10-02  None
2004-10-03  None
2004-10-04  None
2004-10-05  None
2004-10-06  None
2004-10-07  None
2004-10-08  None
2004-10-09  None
2004-10-10  None
2018-07-25  None
2018-07-26  None
2018-07-27  None
2018-07-28  None
2018-07-29  None
2018-07-30  None
2018-07-31  None
2018-08-01  None
2018-08-02  None

Update I found the solution

us_hldys = holidays.country_holidays('US')
df['isHoliday?'] = df.index.to_series().apply(lambda x: x in us_hldys)

isHoliday?
2004-10-01  False
2004-10-02  False
2004-10-03  False
2004-10-04  False
2004-10-05  False
2004-10-06  False
2004-10-07  False
2004-10-08  False
2004-10-09  False
2004-10-10  False
2018-07-25  False
2018-07-26  False
2018-07-27  False
2018-07-28  False
2018-07-29  False
2018-07-30  False
2018-07-31  False
2018-08-01  False
2018-08-02  False
2018-08-03  False

Solution

  • A different strategy is to pre-calculate holidays to avoid apply:

    df.index = pd.to_datetime(df.index)
    hol = pd.Series(holidays.country_holidays('US',  years=range(df.index.min().year,
                                                                 df.index.max().year+1)))
    df['isHoliday?'] = df.index.isin(hol)
    

    Output:

    >>> df
                isHoliday?
    2004-10-01       False
    2004-10-02       False
    2004-10-03       False
    2004-10-04       False
    2004-10-05       False
    2004-10-06       False
    2004-10-07       False
    2004-10-08       False
    2004-10-09       False
    2004-10-10       False
    2018-07-25       False
    2018-07-26       False
    2018-07-27       False
    2018-07-28       False
    2018-07-29       False
    2018-07-30       False
    2018-07-31       False
    2018-08-01       False
    2018-08-02       False
    2018-08-03       False
    
    >>> hol.sort_index()
    2004-01-01                New Year's Day
    2004-01-19    Martin Luther King Jr. Day
    2004-02-16         Washington's Birthday
    2004-05-31                  Memorial Day
    2004-07-04              Independence Day
                             ...            
    2018-10-08                  Columbus Day
    2018-11-11                  Veterans Day
    2018-11-12       Veterans Day (Observed)
    2018-11-22                  Thanksgiving
    2018-12-25                 Christmas Day
    Length: 169, dtype: object