pythonpandasisinpython-holidays

Pandas isin holidays.country_holidays incorrectly returns only False on 1st attempt but correct results on 2nd attempt


I'm struggling with the behavior of pandas .isin when checking if local dates are a local holiday.

I have a data.frame X with utc timestamps which i convert to local date and keep only one row per date in x_daily:

import pandas as pd
import holidays

X = pd.DataFrame({'timestampUtc': pd.date_range("2000-12-25", "2001-01-06", freq="1440min", tz="utc")})
X['local_date'] = X['timestampUtc'].dt.tz_convert(tz='Europe/Berlin').dt.date
x_daily = X[['local_date']].drop_duplicates()

No it gets weird: When i try to find the local holidays with .isinit doesn't find any. When i check each element of the local_datewith in, all holidays are found correctly. Calling .isin again after that also finds the correct holidays.

de_holidays = holidays.country_holidays(country='DE', state='BW')
# 1st try: no holidays found with isin
x_daily['local_date'].isin(de_holidays)
# correct holidays found with list comprehension and 'in'
[x_daily['local_date'].iloc[i] in de_holidays for i in range(x_daily.shape[0])]
# 2nd try: correct holidays found with isin
x_daily['local_date'].isin(de_holidays)

What's a reliable and efficient way, to assign a logical column to identify my local holidays?

I paste the whole code in one block again here:

import pandas as pd
import holidays

X = pd.DataFrame({'timestampUtc': pd.date_range("2000-12-25", "2001-01-06", freq="1440min", tz="utc")})
X['local_date'] = X['timestampUtc'].dt.tz_convert(tz='Europe/Berlin').dt.date
x_daily = X[['local_date']].drop_duplicates()

de_holidays = holidays.country_holidays(country='DE', state='BW')
# 1st try: no holidays found with isin
x_daily['local_date'].isin(de_holidays)
# correct holidays found with list comprehension and 'in'
[x_daily['local_date'].iloc[i] in de_holidays for i in range(x_daily.shape[0])]
# 2nd try: correct holidays found with isin
x_daily['local_date'].isin(de_holidays)

This is my console output: enter image description here


Solution

  • The documentation of the holidays module says:

    To maximize speed, the list of holidays is built as needed on the fly, one calendar year at a time. When you instantiate the object, it is empty, but the moment a key is accessed it will build that entire year’s list of holidays. To prepopulate holidays, instantiate the class with the years argument:

    us_holidays = holidays.US(years=2020)

    I.e. you have to access the list first and it will start to populate it.

    The implementation of isin will convert to argument to a list first, which will in your case result in an empty list.

    You could change your code to

    de_holidays = holidays.country_holidays(country='DE', state='BW', years=[2000, 2001])

    and it should work as expected.