I have a dataframe with multiple index and would like to create a rolling sum of some data, but for each id in the index.
For instance, let us say I have two indexes (Firm and Year) and I have some data with name zdata. The working example is the following:
import pandas as pd
# generating data
firms = ['firm1']*5+['firm2']*5
years = [2000+i for i in range(5)]*2
zdata = [1 for i in range(10)]
# Creating the dataframe
mydf = pd.DataFrame({'firms':firms,'year':years,'zdata':zdata})
# Setting the two indexes
mydf.set_index(['firms','year'],inplace=True)
print(mydf)
zdata
firms year
firm1 2000 1
2001 1
2002 1
2003 1
2004 1
firm2 2000 1
2001 1
2002 1
2003 1
2004 1
And now, I would like to have a rolling sum that starts over for each firm. However, if I type
new_rolling_df=mydf.rolling(window=2).sum()
print(new_rolling_df)
zdata
firms year
firm1 2000 NaN
2001 2.0
2002 2.0
2003 2.0
2004 2.0
firm2 2000 2.0
2001 2.0
2002 2.0
2003 2.0
2004 2.0
It doesn't take into account the multiple index and just make a normal rolling sum. Anyone has an idea how I should do (especially since I have even more indexes than 2 (firm, worker, country, year)
Thanks,
Adrien
Option 1
mydf.unstack(0).rolling(2).sum().stack().swaplevel(0, 1).sort_index()
Option 2
mydf.groupby(level=0, group_keys=False).rolling(2).sum()