pythonpandasaverage

Dataframe: shift expanding mean with groupby


I want to have expanding mean give the result excluding the current item, ie, the items prior average. Here's what i'm looking for:

d = { 'home' : ['A', 'B', 'B', 'A', 'B', 'A', 'A'], 'away' : ['B', 'A','A', 'B', 'A', 'B', 'B'],
 'aw' : [1,0,0,0,1,0,np.nan],'hw' : [0,1,0,1,0,1, np.nan]}

df2 = pd.DataFrame(d, columns=['home', 'away', 'hw', 'aw'])
df2.index = range(1,len(df2) + 1)
df2['homewin_at_home'] = df2.groupby('home')['hw'].apply(pd.expanding_mean)

print df2

Result:

  home away  hw  aw  homewin_at_home
1    A    B   0   1         0.000000
2    B    A   1   0         1.000000
3    B    A   0   0         0.500000
4    A    B   1   0         0.500000
5    B    A   0   1         0.333333
6    A    B   1   0         **0.666667**
7    A    B NaN NaN         0.666667

The number highlighted should be 0.5 as: 'A' won 1 of 2 games home prior to this entry at index = 6. Instead the result includes this game yielding 0.66. Actual output should be:

  home away  hw  aw  homewin_at_home
1    A    B   0   1              NaN
2    B    A   1   0              NaN
3    B    A   0   0         1.000000
4    A    B   1   0         0.000000
5    B    A   0   1         0.500000
6    A    B   1   0         0.500000
7    A    B NaN NaN         0.666667

Things i've tried include adding .shift(1) and trying to slice [:-1] in the groupby but I can't get it to work. Also thought of introducing a helper column but couldn't work out how to keep the original index.

I asked a related question here but I prefer this method to the group-apply-split routine. Any help appreciated.


Solution

  • Is this what you're looking for? Calculates the expanding_mean and shifts the result.

    df['homewin_at_home'] = df.groupby('home')['hw'].apply(lambda x: pd.expanding_mean(x).shift())
    

    or, for more recent versions of pandas:

    df['homewin_at_home'] = df.groupby('home')['hw'].apply(lambda x: x.expanding().mean().shift())
    
      home away  hw  aw  homewin_at_home
    1    A    B   0   1              NaN
    2    B    A   1   0              NaN
    3    B    A   0   0         1.000000
    4    A    B   1   0         0.000000
    5    B    A   0   1         0.500000
    6    A    B   1   0         0.500000
    7    A    B NaN NaN         0.666667