pythondataframememory-efficient

Efficient way to iterate through df rows


I am trying to generate last day of the month for a date field in my dataframe: so there is field start date, I want to add say 5 columns like nep_0, nep_1, nep_2, nep_3, nep_4.

So for nep_0 logic is as below: last day ( policy start date+ relative months (0))

For nep_1: last day(policy start date+ relative months (1)).

Was using this: date_after_month =pol_start_date + relativedelta(months=i) nep_date=datetime.datetime(date_after_month.year,date_after_month.month,calendar.monthrange(date_after_month.year, date_after_month.month)[1])

But this used to iterate through each row of the data frame. My data frame has 1.5 million rows and it is taking a lot of time. can you tell me a better way to achieve the same. may be using apply or map.

I would also like these columns (nep_0, nep_1) to be created dynamically as in not passing relative months argument of 0,1,2,. they should also be looped through.


Solution

  • You can use MonthEnd and rollforward:

    from pandas.tseries.offsets import MonthEnd
    from datetime import datetime
    
    month_count = 5 #define count of months here
    cols = ["nep_" + str(i) for i in range(month_count)] #['nep_0', 'nep_1', 'nep_2', 'nep_3', 'nep_4']
    
    for n,i in enumerate(cols):
        if i == 'nep_0':
            df[i] = df["dates"].apply(lambda x: MonthEnd().rollforward(x)) #get current month's last day.
        else:
            df[i] = df["nep_0"] + MonthEnd(n)
    

    Testing:

    df = pd.DataFrame({"dates":[datetime(2023,10,31),datetime(2023,10,30)]})
    '''
           dates
    0 2023-10-31
    1 2023-10-30
    '''
    

    Now let's use Monthend:

    for n,i in enumerate(cols):
        if i == 'nep_0':
            df[i] = df["dates"].apply(lambda x: MonthEnd().rollforward(x)) #get current month's last day.
        else:
            df[i] = df["nep_0"] + MonthEnd(n)
    

    Out::

           dates      nep_0      nep_1      nep_2      nep_3      nep_4
    0 2023-10-31 2023-10-31 2023-11-30 2023-12-31 2024-01-31 2024-02-29
    1 2023-10-30 2023-10-31 2023-11-30 2023-12-31 2024-01-31 2024-02-29