I have timeseries data with datetimeindex, price and the cycle phase. My dataframe looks like this:
Dateindex | Price | CyclePhase | Day |
---|---|---|---|
1928-01-03 00:00:00 | 71.04 | 0 | 1 |
1928-01-04 00:00:00 | 70.88 | 0 | 2 |
1928-01-05 00:00:00 | 70.2 | 0 | 3 |
1928-01-06 00:00:00 | 70.64 | 0 | 4 |
… | … | … | … |
1929-05-09 00:00:00 | 104.08 | 1 | 400 |
1929-05-10 00:00:00 | 105.36 | 1 | 401 |
1929-05-11 00:00:00 | 104.96 | 1 | 402 |
1929-05-13 00:00:00 | 102.56 | 1 | 403 |
… | … | … | … |
1930-11-08 00:00:00 | 63.56 | 2 | 844 |
1930-11-10 00:00:00 | 62.16 | 2 | 845 |
1930-11-11 00:00:00 | 63.16 | 2 | 846 |
… | … | … | … |
1931-12-29 00:00:00 | 31.84 | 3 | 1185 |
1931-12-30 00:00:00 | 32.4 | 3 | 1186 |
1931-12-31 00:00:00 | 32.48 | 3 | 1187 |
1932-01-02 00:00:00 | 31.28 | 0 | 1 |
1932-01-04 00:00:00 | 30.24 | 0 | 2 |
1932-01-05 00:00:00 | 30.2 | 0 | 3 |
… | … | … | … |
I would like to create the column "Day" where it is counting up for each row in the dataframe until the cycle restarts. (when CyclePhase changes from 3 to 0 again).
How can I do that best in python?
Create grouper to flag the rows where cycle changes the calculate cumcount to assign row numbers per group
s = df['CyclePhase'].diff().lt(0).cumsum()
df['Day'] = s.groupby(s).cumcount() + 1
Dateindex Price CyclePhase Day
0 1928-01-03 00:00:00 71.04 0 1
1 1928-01-04 00:00:00 70.88 0 2
2 1928-01-05 00:00:00 70.20 0 3
3 1928-01-06 00:00:00 70.64 0 4
4 1929-05-09 00:00:00 104.08 1 5
5 1929-05-10 00:00:00 105.36 1 6
6 1929-05-11 00:00:00 104.96 1 7
7 1929-05-13 00:00:00 102.56 1 8
8 1930-11-08 00:00:00 63.56 2 9
9 1930-11-10 00:00:00 62.16 2 10
10 1930-11-11 00:00:00 63.16 2 11
11 1931-12-29 00:00:00 31.84 3 12
12 1931-12-30 00:00:00 32.40 3 13
13 1931-12-31 00:00:00 32.48 3 14
14 1932-01-02 00:00:00 31.28 0 1
15 1932-01-04 00:00:00 30.24 0 2
16 1932-01-05 00:00:00 30.20 0 3