pythonpandas

Pandas pct_change but loop back to start


I'm looking at how to use the pandas pct_change() function, but I need the values 'wrap around', so the last and first values create a percent change value in position 0 rather than NaN.

For example:

df = pd.DataFrame({'Month':[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
                   'Value':[1, 0.9, 0.8, 0.75, 0.75, 0.8, 0.7, 0.65, 0.7, 0.8, 0.85, 0.9]})

    Month  Value
0       1   1.00
1       2   0.90
2       3   0.80
3       4   0.75
4       5   0.75
5       6   0.80
6       7   0.70
7       8   0.65
8       9   0.70
9      10   0.80
10     11   0.85
11     12   0.90

Using pct_change() + 1 gives:

df['percent change'] = df['Value'].pct_change() + 1
    Month  Value  percent change
0       1   1.00             NaN
1       2   0.90        0.900000
2       3   0.80        0.888889
3       4   0.75        0.937500
4       5   0.75        1.000000
5       6   0.80        1.066667
6       7   0.70        0.875000
7       8   0.65        0.928571
8       9   0.70        1.076923
9      10   0.80        1.142857
10     11   0.85        1.062500
11     12   0.90        1.058824

However I also need to know the % change between December (moth=12) and January (month=1), so the NaN should be 1.111111. I hope to eventually do this to several groups within a group by, so muddling about filling in the Nan with one value over the other, or manually calculating all the percentages seems a long winded way to do it. Is there a simpler way to achieve this?


Solution

  • Just use numpy.roll that is designed for this specific purpose:

    import numpy as np
    
    df['percent change'] = df['Value'].div(np.roll(df['Value'], 1))
    

    Output:

        Month  Value  percent change
    0       1   1.00        1.111111
    1       2   0.90        0.900000
    2       3   0.80        0.888889
    3       4   0.75        0.937500
    4       5   0.75        1.000000
    5       6   0.80        1.066667
    6       7   0.70        0.875000
    7       8   0.65        0.928571
    8       9   0.70        1.076923
    9      10   0.80        1.142857
    10     11   0.85        1.062500
    11     12   0.90        1.058824
    

    If you need to perform this per group, combine it with groupby.transform:

    import numpy as np
    
    df['percent change'] = df['Value'].div(df.groupby('Group')['Value'].transform(lambda x: np.roll(x, 1)))
    

    Output:

        Month  Value  Group  percent change
    0       1   1.00      1        1.111111
    1       2   0.90      1        0.900000
    2       3   0.80      1        0.888889
    3       4   0.75      1        0.937500
    4       5   0.75      1        1.000000
    5       6   0.80      1        1.066667
    6       7   0.70      1        0.875000
    7       8   0.65      1        0.928571
    8       9   0.70      1        1.076923
    9      10   0.80      1        1.142857
    10     11   0.85      1        1.062500
    11     12   0.90      1        1.058824
    12      1   1.00      2        1.111111
    13      2   0.90      2        0.900000
    14      3   0.80      2        0.888889
    15      4   0.75      2        0.937500
    16      5   0.75      2        1.000000
    17      6   0.80      2        1.066667
    18      7   0.70      2        0.875000
    19      8   0.65      2        0.928571
    20      9   0.70      2        1.076923
    21     10   0.80      2        1.142857
    22     11   0.85      2        1.062500
    23     12   0.90      2        1.058824