pythonpandasdiffdifferenceperiod

How to find how the values have changed from the begining of their period?


I have a dataframe with timeindex. There is a period (cycle) that starts every august. I want to calculate the difference between the value of each month and the value that was on the previous august (the beginning of its period). The aim is to know how much the values have changed during each cycle.

This is an small example of the data:

import pandas as pd
import numpy as np

df = pd.DataFrame({'date': ['2022-01-01 00:00:00', '2022-02-01 00:00:00', 
                          '2021-03-01 00:00:00', '2021-04-01 00:00:00',
                          '2021-05-01 00:00:00', '2021-06-01 00:00:00',
                          '2021-07-01 00:00:00', '2021-08-01 00:00:00',
                          '2021-09-01 00:00:00', '2021-10-01 00:00:00',
                          '2021-11-01 00:00:00', '2021-12-01 00:00:00',
                          '2022-01-01 00:00:00', '2022-02-01 00:00:00',
                          '2022-03-01 00:00:00', '2022-04-01 00:00:00',
                          '2022-05-01 00:00:00', '2022-06-01 00:00:00',
                          '2022-07-01 00:00:00', '2022-08-01 00:00:00',
                          '2022-09-01 00:00:00', '2022-10-01 00:00:00',
                          '2022-11-01 00:00:00', '2023-12-01 00:00:00',
                          '2023-01-01 00:00:00', '2023-02-01 00:00:00',
                          '2023-03-01 00:00:00', '2023-04-01 00:00:00',
                          '2023-05-01 00:00:00', '2023-06-01 00:00:00',
                          ], 
               'value1': [0.452762281,0.372262281,0.513928948,0.447762281,
                            0.377095615,0.355095615,0.271428948,0.291762281,
                            0.476762281,0.335928948,0.280428948,0.283762281,
                            0.322928948,0.287262281,0.316928948,0.209262281,
                            0.407928948,0.254262281,0.232095615,0.264262281,
                            0.076095615,-0.025237719,-0.042237719,-0.094904385,
                            0.017428948,-0.036071052,-0.094071052,-0.071404385,
                            0.008095615,-0.141571052],
               'value2': [9.6,8,8.4,6.2,6.2,6,3.9,8.5,8.3,5.3,5.6,5.3,
                               6.2,6.3,6.9,4.8,6.7,3.6,3,4.6,2.3,1.3,1,0.3,
                               1.6,0.4,1.5,1.4,2.2,1.2]})
df = df.set_index('date')

I have tried to get a dataframe with the same structure but all the values have the value that was on its corresponding august, and then I just need to subtract one dataframe from the other. But I couldn't find a way to do it neither.

Thank you vary much in advance.


Solution

  • If I understand you correctly, you can use .groupby() and then calculate the difference in each group:

    df.index = pd.to_datetime(df.index)
    
    tmp = (df.index.month == 8).cumsum()
    out = (
        df[tmp != 0]
        .groupby(tmp[tmp > 0], group_keys=False)
        .apply(lambda x: x[["value1", "value2"]] - x[["value1", "value2"]].iloc[0])
    ).add_prefix('calculated_')
    
    x = df.join(out)
    print(x)
    

    Prints:

                  value1  value2  calculated_value1  calculated_value2
    date                                                              
    2021-03-01  0.513929     8.4                NaN                NaN
    2021-04-01  0.447762     6.2                NaN                NaN
    2021-05-01  0.377096     6.2                NaN                NaN
    2021-06-01  0.355096     6.0                NaN                NaN
    2021-07-01  0.271429     3.9                NaN                NaN
    2021-08-01  0.291762     8.5           0.000000                0.0
    2021-09-01  0.476762     8.3           0.185000               -0.2
    2021-10-01  0.335929     5.3           0.044167               -3.2
    2021-11-01  0.280429     5.6          -0.011333               -2.9
    2021-12-01  0.283762     5.3          -0.008000               -3.2
    2022-01-01  0.452762     9.6           0.031167               -2.3
    2022-01-01  0.322929     6.2           0.031167               -2.3
    2022-02-01  0.372262     8.0          -0.004500               -2.2
    2022-02-01  0.287262     6.3          -0.004500               -2.2
    2022-03-01  0.316929     6.9           0.025167               -1.6
    2022-04-01  0.209262     4.8          -0.082500               -3.7
    2022-05-01  0.407929     6.7           0.116167               -1.8
    2022-06-01  0.254262     3.6          -0.037500               -4.9
    2022-07-01  0.232096     3.0          -0.059667               -5.5
    2022-08-01  0.264262     4.6           0.000000                0.0
    2022-09-01  0.076096     2.3          -0.188167               -2.3
    2022-10-01 -0.025238     1.3          -0.289500               -3.3
    2022-11-01 -0.042238     1.0          -0.306500               -3.6
    2023-01-01  0.017429     1.6          -0.246833               -3.0
    2023-02-01 -0.036071     0.4          -0.300333               -4.2
    2023-03-01 -0.094071     1.5          -0.358333               -3.1
    2023-04-01 -0.071404     1.4          -0.335667               -3.2
    2023-05-01  0.008096     2.2          -0.256167               -2.4
    2023-06-01 -0.141571     1.2          -0.405833               -3.4
    2023-12-01 -0.094904     0.3          -0.359167               -4.3