pythonpandasdatetimeresampling

Resampling Partial Month to Full End of Month - Pandas


Putting this out there to see if there is a feasible solution for this.

Suppose I have a volumes for every month that say looked liked :

enter image description here

But if the start date for these volumes to be assigned is somewhere in a partial month, lets say in this example its 7/15/2023.

enter image description here

Is there a way to resample the data that sums of the correct volumes till the end of the month ? The answer should look like this.

enter image description here

I tried the simplae resampling , it just shifts the dates to the end of the month, but not in the way I wanted it to be. Any ideas?

import pandas as pd
df = pd.DataFrame({"DATE":["07/15/2023","08/15/2023","09/15/2023"], "VOL":[100, 150, 100]})
df['DATE']=df['DATE'].apply(pd.to_datetime)
df =df.resample('M', on='DATE').sum().reset_index()
print(df)

Solution

  • IIUC, you can do:

    import calendar
    
    df = pd.DataFrame(
        {"DATE": ["07/15/2023", "08/15/2023", "09/15/2023"], "VOL": [100, 150, 100]}
    )
    
    
    def get_days_in_month(year, month):
        return calendar.monthrange(year, month)[1]
    
    df["DATE"] = df["DATE"].apply(pd.to_datetime)
    
    days_to_month_end = ((df["DATE"] + pd.offsets.MonthEnd()) - df["DATE"]).dt.days
    days_in_month = df["DATE"].apply(lambda x: get_days_in_month(x.year, x.month))
    
    tmp = df["VOL"] * days_to_month_end / days_in_month
    df["RESULT"] = (df["VOL"].shift().fillna(0) - tmp.shift().fillna(0)) + tmp
    print(df)
    

    Prints:

            DATE  VOL      RESULT
    0 2023-07-15  100   51.612903
    1 2023-08-15  150  125.806452
    2 2023-09-15  100  122.580645