rdataframedplyraggregate

R sports dataframe - combining and aggregating data in several columns


I have an R dataframe of sports data like this:

 HomeTeam   AwayTeam   HomeWin     Draw     AwayWin
1 Barcelona    Madrid    2.30      3.28       3.27
2    Madrid Liverpool    8.79      5.12       1.36
3 Liverpool Barcelona    3.41      3.34       2.21
4    Madrid Barcelona    1.38      5.08       7.92
5 Liverpool    Madrid    1.53      4.07       6.83
6 Barcelona Liverpool    3.35      3.62       2.12

So separate columns for Home and Away Teams, and then 3 columns for Home, Draw and Away win odds.

I want to create a new dataframe showing the average win odds for each of the teams. So I want to take the Win odds (regardless whether they are for home or away) for each team and calculate the average. For example in case of Barcelona, the Average Win Odds are (2.30 + 2.21 + 7.92 + 3.35) / 4 = 3.95. The table I want to make looks like this:

      Team       AverageWinOdds
1   Barcelona           3.95
2    Madrid             5.07
3   Liverpool           2.11

How can I do this ?


Solution

  • 1) Using the data shown reproducibly in the Note at the end use reframe to create a two column data frame and then summarize it.

    library(dplyr)
    
    dat %>%
      reframe(Team = c(HomeTeam, AwayTeam), Win = c(HomeWin, AwayWin)) %>%
      summarize(avgWin = mean(Win), .by = Team)
    ##        Team avgWin
    ## 1 Barcelona 3.9450
    ## 2    Madrid 5.0675
    ## 3 Liverpool 2.1050
    

    2) An alternative is to use pivot_longer to create a long form data frame and then summarize as before.

    library(dplyr)
    library(tidyr)
    
    dat %>%
      pivot_longer(cols = 1:2, values_to = "Team") %>%
      mutate(Win = ifelse(name == "HomeTeam", HomeWin, AwayWin)) %>%
      summarize(avgWin = mean(Win), .by = Team)
    ## # A tibble: 3 × 2
    ##   Team      avgWin
    ##   <chr>      <dbl>
    ## 1 Barcelona   3.94
    ## 2 Madrid      5.07
    ## 3 Liverpool   2.10
    

    Note

    The input data in easily reproducible form:

    dat <- data.frame(
      HomeTeam = c("Barcelona", "Madrid", "Liverpool", "Madrid", "Liverpool", "Barcelona"),
      AwayTeam = c("Madrid", "Liverpool", "Barcelona", "Barcelona", "Madrid", "Liverpool"),
      HomeWin = c(2.3, 8.79, 3.41, 1.38, 1.53, 3.35),
      Draw = c(3.28, 5.12, 3.34, 5.08, 4.07, 3.62),
      AwayWin = c(3.27, 1.36, 2.21, 7.92, 6.83, 2.12)
    )