pythonpandasdatetimepython-dateutil

Shifting daily data exactly one month forward (not 30 days)


I want to shift the date index of my daily data forward by one month. This is in order to calculate a rolling month-on-month change on the daily data by calculating on two dataframes - the existing one with the correct time stamps and a new one with the time stamps shifted forward one month with all the other data columns the same

The problem is that not all months have the same length, so I don't want to roll it forward by 30 days. I would like for example the 24th of March to match with 24th of February (28 days), and the 24th of April to match the 24th March (31 days).

I tried variations of the following with no joy:

df= pd.read_csv("seven_day.csv", parse_dates=['Date'], index_col=['Date'])

df.shift(periods=1, freq="M")
df.shift(periods=1, freq="MS")

What happens is I get the end or start of the next month for ALL of the days in the current month. Eg all March daily data days go to the end or start of April.

Would massively appreciate any insight.


Solution

  • You can use a DateOffset:

    df.set_index(df.index+pd.DateOffset(months=1))
    

    Example:

    df = pd.DataFrame({'data': range(6)},
                       index=pd.to_datetime(['2022-12-31', '2023-01-24',
                                             '2023-01-31', '2023-02-24',
                                             '2023-02-28', '2023-03-31']))
    
    df.join(df.set_index(df.index+pd.DateOffset(months=1)).add_suffix('_shift'))
    

    Output:

                data  data_shift
    2022-12-31     0         NaN
    2023-01-24     1         NaN
    2023-01-31     2         0.0 # matches 2021-12-31
    2023-02-24     3         1.0 # matches 2023-01-24
    2023-02-28     4         2.0 # matches 2023-01-31
    2023-03-31     5         NaN
    

    Intermediate:

    df.set_index(df.index+pd.DateOffset(months=1)).add_suffix('_shift')
    
                data_shift
    2023-01-31           0
    2023-02-24           1
    2023-02-28           2
    2023-03-24           3
    2023-03-28           4
    2023-04-30           5
    

    Other way around (note the - sign):

    df.join(df.set_index(df.index-pd.DateOffset(months=1)).add_suffix('_shift'))
    # or
    df.join(df.set_index(df.index+pd.DateOffset(months=-1)).add_suffix('_shift'))
    

    Output:

                data  data_shift
    2022-12-31     0         2.0 # matches 2021-01-31
    2023-01-24     1         3.0 # matches 2021-02-24
    2023-01-31     2         NaN
    2023-02-24     3         NaN
    2023-02-28     4         5.0 # matches 2023-03-31
    2023-03-31     5         NaN
    

    Intermediate:

    df.set_index(df.index-pd.DateOffset(months=1)).add_suffix('_shift')
    
                data_shift
    2022-11-30           0
    2022-12-24           1
    2022-12-31           2
    2023-01-24           3
    2023-01-28           4
    2023-02-28           5