I want to have expanding mean give the result excluding the current item, ie, the items prior average. Here's what i'm looking for:
d = { 'home' : ['A', 'B', 'B', 'A', 'B', 'A', 'A'], 'away' : ['B', 'A','A', 'B', 'A', 'B', 'B'],
'aw' : [1,0,0,0,1,0,np.nan],'hw' : [0,1,0,1,0,1, np.nan]}
df2 = pd.DataFrame(d, columns=['home', 'away', 'hw', 'aw'])
df2.index = range(1,len(df2) + 1)
df2['homewin_at_home'] = df2.groupby('home')['hw'].apply(pd.expanding_mean)
print df2
Result:
home away hw aw homewin_at_home
1 A B 0 1 0.000000
2 B A 1 0 1.000000
3 B A 0 0 0.500000
4 A B 1 0 0.500000
5 B A 0 1 0.333333
6 A B 1 0 **0.666667**
7 A B NaN NaN 0.666667
The number highlighted should be 0.5
as: 'A' won 1 of 2 games home prior to this entry at index = 6. Instead the result includes this game yielding 0.66
. Actual output should be:
home away hw aw homewin_at_home
1 A B 0 1 NaN
2 B A 1 0 NaN
3 B A 0 0 1.000000
4 A B 1 0 0.000000
5 B A 0 1 0.500000
6 A B 1 0 0.500000
7 A B NaN NaN 0.666667
Things i've tried include adding .shift(1)
and trying to slice [:-1]
in the groupby but I can't get it to work. Also thought of introducing a helper column but couldn't work out how to keep the original index.
I asked a related question here but I prefer this method to the group-apply-split routine. Any help appreciated.
Is this what you're looking for? Calculates the expanding_mean
and shifts
the result.
df['homewin_at_home'] = df.groupby('home')['hw'].apply(lambda x: pd.expanding_mean(x).shift())
or, for more recent versions of pandas:
df['homewin_at_home'] = df.groupby('home')['hw'].apply(lambda x: x.expanding().mean().shift())
home away hw aw homewin_at_home
1 A B 0 1 NaN
2 B A 1 0 NaN
3 B A 0 0 1.000000
4 A B 1 0 0.000000
5 B A 0 1 0.500000
6 A B 1 0 0.500000
7 A B NaN NaN 0.666667