Here's a dataframe, with some parameters for each id by some regular quarterly dates. It's originally shuffled randomly, but, at first, let say, it is sorted by fab_date and id.
import pandas as pd
np.random.seed(1)
dt_to_fun = pd.DataFrame({
'fab_date': pd.to_datetime(["2022-01-01", "2022-07-01", "2023-01-01", "2023-07-01",
"2022-01-01", "2022-07-01", "2023-01-01", "2023-07-01",
"2022-01-01", "2022-07-01", "2023-01-01", "2023-07-01"]),
'id': ['n_01', 'n_01', 'n_01', 'n_01', 'n_02', 'n_02', 'n_02', 'n_02', 'n_03', 'n_03', 'n_03', 'n_03'],
'param_01': np.random.choice([10, 20, 30], size=12),
'param_02': np.random.choice([10, 30, 50], size=12)
})
dt_to_fun
fab_date id param_01 param_02
0 2022-01-01 n_01 20 30
1 2022-07-01 n_01 10 50
2 2023-01-01 n_01 10 10
3 2023-07-01 n_01 20 50
4 2022-01-01 n_02 20 30
5 2022-07-01 n_02 10 50
6 2023-01-01 n_02 10 10
7 2023-07-01 n_02 20 10
8 2022-01-01 n_03 10 50
9 2022-07-01 n_03 20 10
10 2023-01-01 n_03 10 30
11 2023-07-01 n_03 30 50
The goal is: for each id replace (last date) param value like this: param_01(last date) = param_01(last date) + param_01(-1 date) - param_01(-2 date) For example, id n_03 last value of param_02 is 50, previous is 30, two times earlier is 10, so the result should be 50 + 30 - 10 = 70. The same for each id and all param* columns. Only parameters in rows with last date for each id should be updated.
Resulting table should be like this:
dt_to_fun
fab_date id param_01 param_02
0 2022-01-01 n_01 20 30
1 2022-07-01 n_01 10 50
2 2023-01-01 n_01 10 10
3 2023-07-01 n_01 20 10
4 2022-01-01 n_02 20 30
5 2022-07-01 n_02 10 50
6 2023-01-01 n_02 10 10
7 2023-07-01 n_02 20 -30
8 2022-01-01 n_03 10 50
9 2022-07-01 n_03 20 10
10 2023-01-01 n_03 10 30
11 2023-07-01 n_03 20 70
I've tried applymap, shift etc, but can not figure out, how this all should work together. Would appreciate any help.
IIUC, use a custom groupby.transform
with diff
and shift
:
cols = ['param_01', 'param_02']
dt_to_fun.loc[~dt_to_fun['id'].duplicated(keep='last'), cols] = (
dt_to_fun.groupby('id')[cols]
.transform(lambda x: x.diff().shift())
.add(dt_to_fun[cols])
)
Or with multiple groupby.shift
:
cols = ['param_01', 'param_02']
g = dt_to_fun.groupby('id')[cols]
dt_to_fun.loc[~dt_to_fun['id'].duplicated(keep='last'), cols] = (
dt_to_fun[cols] + g.shift(1) - g.shift(2)
)
Output:
fab_date id param_01 param_02
0 2022-01-01 n_01 20 30
1 2022-07-01 n_01 10 50
2 2023-01-01 n_01 10 10
3 2023-07-01 n_01 20 10
4 2022-01-01 n_02 20 30
5 2022-07-01 n_02 10 50
6 2023-01-01 n_02 10 10
7 2023-07-01 n_02 20 -30
8 2022-01-01 n_03 10 50
9 2022-07-01 n_03 20 10
10 2023-01-01 n_03 10 30
11 2023-07-01 n_03 20 70