I have difficulty to use pd.rolling_mean function on the following data frame containing duplicated indices:
amount
20140101 3
20140102 4
20140103 3
20140103 5
20140103 1
20140104 5
20140105 6
20140106 2
…
I need to calculate the 3 day average of 'amount', for example, average from 20140101
to 20140103
should be (3+4+3+5+1)/5=3.2
, the average of amount from 20140104 to 20140106 should be (5+6+2)/3=4.3
does anyone know how to do it? Thank you in advance!
you can do:
>>> df
amount
20140101 3
20140102 4
20140103 3
20140103 5
20140103 1
20140104 5
20140105 6
20140106 2
>>> xf = df.groupby(level=0)['amount'].agg(['sum', 'count'])
>>> xf
sum count
20140101 3 1
20140102 4 1
20140103 9 3
20140104 5 1
20140105 6 1
20140106 2 1
>>> pd.rolling_sum(xf['sum'], 3, 0) / pd.rolling_sum(xf['count'], 3, 0)
20140101 3.000
20140102 3.500
20140103 3.200
20140104 3.600
20140105 4.000
20140106 4.333
dtype: float64
and you get 3.2
and 4.3
for 20140103
and 20140106
respectively.