pythonpandasrolling-sum

pandas:conditional rolling sum of two columns


I'd like to calculate a measure for momentum of soccer teams, in this case the points that a certain team got in its last 3 games. My data looks as follows:

    HomeTeam    AwayTeam    H_Pts   A_Pts
    Barcelona   Getafe      3       0
    Levante     Barcelona   1       1
    Barcelona   Las Palmas  3       0
    Las Palmas  Barcelona   3       0
    Barcelona   Madrid      1       1

This is just a sample for some Barcelona games. So basically what i want to end up with is two additional columns (say Home_Momentum, Away_Momentum), which adds up the points that this particular team got in its last 3 matches (not including the current one). So it should look something like this:

    HomeTeam    AwayTeam    H_Pts   A_Pts    Home_Momentum    Away_Momentum
    Barcelona   Getafe      3       0        NaN              NaN
    Levante     Barcelona   1       1        NaN              NaN
    Barcelona   Las Palmas  3       0        NaN              NaN
    Las Palmas  Barcelona   3       0        x                7    
    Barcelona   Madrid      1       1        4                y

where x (y) is the sum of points that Las Palmas (Madrid) collected in its last 3 games.

What I've come up with so far is:

data["Home_Momentum"] = data.groupby("HomeTeam")["H_Pts"].apply(lambda x: x.rolling(3).sum().shift())

But the problem with this is that it doesnt consider the away games of the team.

Do you have any ideas how to solve this?


Solution

  • Rename, columns to be a multiindex. The stack and run a rolling sum

    df.columns = [
        ['Team', 'Team', 'Points', 'Points'],
        ['Home', 'Away', 'Home', 'Away']
    ]
    
    d1 = df.stack()
    
    mom = d1.groupby('Team').Points.apply(lambda x: x.shift().rolling(3).sum())
    
    d1.assign(Momentum=mom).unstack()
    
      Points             Team             Momentum     
        Away Home        Away        Home     Away Home
    0      0    3      Getafe   Barcelona      NaN  NaN
    1      1    1   Barcelona     Levante      NaN  NaN
    2      0    3  Las Palmas   Barcelona      NaN  NaN
    3      0    3   Barcelona  Las Palmas      7.0  NaN
    4      1    1      Madrid   Barcelona      NaN  4.0
    

    We can include sums over few games than 3 as well.

    df.columns = [
        ['Team', 'Team', 'Points', 'Points'],
        ['Home', 'Away', 'Home', 'Away']
    ]
    
    d1 = df.stack()
    
    mom = d1.groupby('Team').Points.apply(lambda x: x.shift().rolling(3, 1).sum())
    
    d1.assign(Momentum=mom).unstack()
    
      Points             Team             Momentum     
        Away Home        Away        Home     Away Home
    0      0    3      Getafe   Barcelona      NaN  NaN
    1      1    1   Barcelona     Levante      3.0  NaN
    2      0    3  Las Palmas   Barcelona      NaN  4.0
    3      0    3   Barcelona  Las Palmas      7.0  0.0
    4      1    1      Madrid   Barcelona      NaN  4.0