
Pandas Dataframe with mixed periods - 1M, 6M and 12M - to Dataframe with one period only

I have a dataframe with income for a certain period of time. The period is given by a start date and an end date (For example 2023-04-01 and 2023-06-30). The period can vary between 3, 6 and 12 months. My goal is to bring everything to the same period, but I did not find an easy way to do it.

This is how I am doing it now:

        new_rows = []
        for _,row in df.iterrows():
            for i in range(row["date_diff"]):
                new_row = row.copy()
                new_row["new_date"] = row["date"] + relativedelta(months=i) + pd.offsets.MonthEnd()
                new_row["new_net_revenues"] = row["net_revenues"] / row["date_diff"]
        new_df = pd.DataFrame(new_rows)

This obviously takes a hell lot of time for larger dataframes. I wonder if there is a much faster solution with builtin funtions? Essentially it is kind of an explode to the months in the period. I also tried to just concat the rows according to the number of months in the period and do the math afterwards, but still not very elegant.


The code above transforms this:

   net_revenues       date    date_to  date_diff
0          0.02 2023-04-01 2023-06-30          3
1          0.01 2023-04-01 2023-06-30          3
2          0.02 2023-01-01 2023-03-31          3

into this:

   new_net_revenues   new_date
0          0.006667 2023-04-30
0          0.006667 2023-05-31
0          0.006667 2023-06-30
1          0.003333 2023-04-30
1          0.003333 2023-05-31
1          0.003333 2023-06-30
2          0.006667 2023-01-31
2          0.006667 2023-02-28
2          0.006667 2023-03-31
3          0.003333 2023-01-31
3          0.003333 2023-02-28
3          0.003333 2023-03-31


  • You can create new rows with Index.repeat, add months by counter by GroupBy.cumcount and create datetimes by Series.dt.to_timestamp:

    df = pd.DataFrame({'date':['2023-04-01', '2023-01-01'],
                       'date_to':['2023-06-30', '2023-06-30'], 
    print (df)
            date    date_to  net_revenues
    0 2023-04-01 2023-06-30           100
    1 2023-01-01 2023-06-30            20
    periods ="M")
    diff = (df.date_to.dt.to_period("M") - periods).apply(lambda x: x.n) + 1
    new_df = df.loc[df.index.repeat(diff)]
    new_df['new_date'] = (periods + new_df.groupby(level=0).cumcount()).dt.to_timestamp("M")
    new_df["new_net_revenues"] = new_df["net_revenues"] / diff
    print (new_df)
            date    date_to  net_revenues   new_date  new_net_revenues
    0 2023-04-01 2023-06-30           100 2023-04-30         33.333333
    0 2023-04-01 2023-06-30           100 2023-05-31         33.333333
    0 2023-04-01 2023-06-30           100 2023-06-30         33.333333
    1 2023-01-01 2023-06-30            20 2023-01-31          3.333333
    1 2023-01-01 2023-06-30            20 2023-02-28          3.333333
    1 2023-01-01 2023-06-30            20 2023-03-31          3.333333
    1 2023-01-01 2023-06-30            20 2023-04-30          3.333333
    1 2023-01-01 2023-06-30            20 2023-05-31          3.333333
    1 2023-01-01 2023-06-30            20 2023-06-30          3.333333