pythonpandaspanelmulti-indexrolling-sum

Pandas: Rolling sum with multiple indexes (i.e. panel data)


I have a dataframe with multiple index and would like to create a rolling sum of some data, but for each id in the index.

For instance, let us say I have two indexes (Firm and Year) and I have some data with name zdata. The working example is the following:

import pandas as pd

# generating data
firms = ['firm1']*5+['firm2']*5
years = [2000+i for i in range(5)]*2
zdata = [1 for i in range(10)]

# Creating the dataframe
mydf  = pd.DataFrame({'firms':firms,'year':years,'zdata':zdata})

# Setting the two indexes
mydf.set_index(['firms','year'],inplace=True)

print(mydf)
             zdata
firms year       
firm1 2000      1
      2001      1
      2002      1
      2003      1
      2004      1
firm2 2000      1
      2001      1
      2002      1
      2003      1
      2004      1

And now, I would like to have a rolling sum that starts over for each firm. However, if I type

new_rolling_df=mydf.rolling(window=2).sum()

print(new_rolling_df)
              zdata
  firms year       
  firm1 2000    NaN
        2001    2.0
        2002    2.0
        2003    2.0
        2004    2.0
  firm2 2000    2.0
        2001    2.0
        2002    2.0
        2003    2.0
        2004    2.0

It doesn't take into account the multiple index and just make a normal rolling sum. Anyone has an idea how I should do (especially since I have even more indexes than 2 (firm, worker, country, year)

Thanks,

Adrien


Solution

  • Option 1

    mydf.unstack(0).rolling(2).sum().stack().swaplevel(0, 1).sort_index()
    

    enter image description here

    Option 2

    mydf.groupby(level=0, group_keys=False).rolling(2).sum()
    

    enter image description here