I have a Pandas dataframe that is setup like so:
Code StartDate EndDate
A 2024-07-01 2024-08-03
B 2024-08-06 2024-08-10
C 2024-08-11 2024-08-31
I have a part of my code that iterates through each day (starting from 2024-07-01) and I am trying to return the corresponding Code
given a date (with a fallback if the date does not fall within any StartDate/EndDate range).
My original idea was to do something like:
DAYS = DAY_DF['Date'].tolist() # Just a list of each day
for DAY in DAYS:
code = False
for i,r in df.iterrows():
if r['StartDate'] <= DAY <= r['EndDate']:
code = r['Code']
break
if not code: # `Code` is still False
code = 'Fallback_Code'
But this seems very inefficient to iterate over each row in the dataframe especially because I have a lot of records in my dataframe.
Here are some example inputs and the resulting code output:
2024-07-03 -> 'A'
2024-08-04 -> 'Fallback_Code'
2024-08-10 -> 'B'
2024-08-11 -> 'C'
A possible solution, which converts the StartDate
and EndDate
columns to datetime
format (to allow for comparison of dates). It then checks if a specific date (e.g., 2024-07-03
) falls within any of the date ranges defined by StartDate
and EndDate
. If it does, it retrieves the first corresponding Code
from those rows; if not, it returns Fallback code
.
df['StartDate'] = pd.to_datetime(df['StartDate'])
df['EndDate'] = pd.to_datetime(df['EndDate'])
date = '2025-07-03' # input example
m = df['StartDate'].le(date) & df['EndDate'].ge(date)
df.loc[m, 'Code'].iloc[0] if m.any() else 'Fallback code'
To get the codes for a list of dates, we might use the following:
dates = ['2024-07-03', '2024-08-04', '2024-08-10', '2024-08-11']
m = lambda x: df['StartDate'].le(x) & df['EndDate'].ge(x)
{date: df.loc[m(date), 'Code'].iloc[0] if m(date).any() else 'Fallback code'
for date in dates}
Output:
{'2024-07-03': 'A',
'2024-08-04': 'Fallback code',
'2024-08-10': 'B',
'2024-08-11': 'C'}