pythonpandas

Pandas Return Corresponding Column Based on Date Being Between Two Values


I have a Pandas dataframe that is setup like so:

Code    StartDate      EndDate
   A   2024-07-01   2024-08-03
   B   2024-08-06   2024-08-10
   C   2024-08-11   2024-08-31

I have a part of my code that iterates through each day (starting from 2024-07-01) and I am trying to return the corresponding Code given a date (with a fallback if the date does not fall within any StartDate/EndDate range).

My original idea was to do something like:

DAYS = DAY_DF['Date'].tolist() # Just a list of each day
for DAY in DAYS:
    code = False
    for i,r in df.iterrows():
        if r['StartDate'] <= DAY <= r['EndDate']:
            code = r['Code']
            break
    if not code: # `Code` is still False
        code = 'Fallback_Code'

But this seems very inefficient to iterate over each row in the dataframe especially because I have a lot of records in my dataframe.

Here are some example inputs and the resulting code output:

2024-07-03 -> 'A'
2024-08-04 -> 'Fallback_Code'
2024-08-10 -> 'B'
2024-08-11 -> 'C'

Solution

  • A possible solution, which converts the StartDate and EndDate columns to datetime format (to allow for comparison of dates). It then checks if a specific date (e.g., 2024-07-03) falls within any of the date ranges defined by StartDate and EndDate. If it does, it retrieves the first corresponding Code from those rows; if not, it returns Fallback code.

    df['StartDate'] = pd.to_datetime(df['StartDate'])
    df['EndDate'] = pd.to_datetime(df['EndDate'])
    
    date = '2025-07-03' # input example
    
    m = df['StartDate'].le(date) & df['EndDate'].ge(date)
    df.loc[m, 'Code'].iloc[0] if m.any() else 'Fallback code'
    

    To get the codes for a list of dates, we might use the following:

    dates = ['2024-07-03', '2024-08-04', '2024-08-10', '2024-08-11']
    
    m = lambda x: df['StartDate'].le(x) & df['EndDate'].ge(x)
    
    {date: df.loc[m(date), 'Code'].iloc[0] if m(date).any() else 'Fallback code' 
     for date in dates}
    

    Output:

    {'2024-07-03': 'A',
     '2024-08-04': 'Fallback code',
     '2024-08-10': 'B',
     '2024-08-11': 'C'}