pythonpandasloopsexecution-timepython-holidays

Python : my code is slow, using for loop with Pandas dataframe


I got this code below which works fine but takes too much time to execute (>30s):

df['Date_engagement'] = np.nan
for i in range(df.shape[0]):
if not(pd.isna(df.loc[i,'Engagement'])):
    df.loc[i, 'Date_engagement'] = np.busday_offset(
        df.loc[i, 'Date_réception'].date(),
        df.loc[i,'Engagement'], roll='backward', busdaycal=france_bdc
    )

I guess this is because I loop through the Pandas dataframe, and I wonder if there is a way to achieve the same result, not using any loop.

Basically what this code does is creating the 'Date_engagement' column which will be the result of adding 'Engagement' days to 'Date_réception' if *'Engagement' *is not empty. I use *'np.busday_offset()' *to avoid holidays.

Thanks in advance for your help...

I tried to use Pandas features for dataframes (like DataFrame.apply()), but can't find how to do it correctly to get the same result as explained.


Solution

  • Suppose the input dataframe:

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame({'Date_réception': ['2023-05-12', '2023-05-13'],
                       'Engagement': [10, np.nan]})
    print(df)
    
    # Output
      Date_réception  Engagement
    0     2023-05-12        10.0
    1     2023-05-13         NaN
    

    You can vectorize your code:

    france_bdc = np.busdaycalendar(weekmask='1111100', holidays=['2023-05-18'])
    m = df['Engagement'].notna()
    
    df.loc[m, 'Date_engagement'] = (
        np.busday_offset(df.loc[m, 'Date_réception'].values.astype('datetime64[D]'),
                         df.loc[m, 'Engagement'], roll='backward', busdaycal=france_bdc)
    )
    

    Output:

    >>> df
      Date_réception  Engagement Date_engagement
    0     2023-05-12        10.0      2023-05-29
    1     2023-05-13         NaN             NaT