Using this code,
import numpy as np
import pandas as pd
df = pd.DataFrame ({'Date':['2000-01-01', '2000-02-01', '2000-03-01',
'2000-01-01', '2000-02-01', '2000-03-01','2000-04-01'
],
'id':['1', '1', '1', '2', '2', '2','2'],
'bal_tot':[10, 20, 30, 40, 50, 60,70],
'bal_d1_pct': ['nan', 1, 2, 'nan', 3, 4, 5]
})
I need to create a new variable 'fore' for each id. For example,
if Date = '2000-01-01' and id=1 then fore = 10
if Date = '2000-02-01' and id=1 then fore = 10 *(1+1/100) = 10.1
if Date = '2000-03-01' and id = 1 then fore = 10.1 * (1+2/100) = 10.302
etc.
The final data should look like this:
How to do it?
You can do a groupby, but you need to have your bal_d1_pct
to be numerical first:
df.bal_d1_pct = pd.to_numeric(df.bal_d1_pct, errors='coerce').fillna(0)/100 + 1
df['fore'] = (df.groupby('id')
.apply(lambda x: x.bal_tot.iloc[0] * x.bal_d1_pct.cumprod())
.reset_index('id',drop=True)
)
Output:
Date id bal_tot bal_d1_pct fore
0 2000-01-01 1 10 1.00 10.0000
1 2000-02-01 1 20 1.01 10.1000
2 2000-03-01 1 30 1.02 10.3020
3 2000-01-01 2 40 1.00 40.0000
4 2000-02-01 2 50 1.03 41.2000
5 2000-03-01 2 60 1.04 42.8480
6 2000-04-01 2 70 1.05 44.9904