I have an R dataframe of sports data like this:
HomeTeam AwayTeam HomeWin Draw AwayWin
1 Barcelona Madrid 2.30 3.28 3.27
2 Madrid Liverpool 8.79 5.12 1.36
3 Liverpool Barcelona 3.41 3.34 2.21
4 Madrid Barcelona 1.38 5.08 7.92
5 Liverpool Madrid 1.53 4.07 6.83
6 Barcelona Liverpool 3.35 3.62 2.12
So separate columns for Home and Away Teams, and then 3 columns for Home, Draw and Away win odds.
I want to create a new dataframe showing the average win odds for each of the teams. So I want to take the Win odds (regardless whether they are for home or away) for each team and calculate the average. For example in case of Barcelona, the Average Win Odds are (2.30 + 2.21 + 7.92 + 3.35) / 4 = 3.95. The table I want to make looks like this:
Team AverageWinOdds
1 Barcelona 3.95
2 Madrid 5.07
3 Liverpool 2.11
How can I do this ?
1) Using the data shown reproducibly in the Note at the end use reframe
to create a two column data frame and then summarize
it.
library(dplyr)
dat %>%
reframe(Team = c(HomeTeam, AwayTeam), Win = c(HomeWin, AwayWin)) %>%
summarize(avgWin = mean(Win), .by = Team)
## Team avgWin
## 1 Barcelona 3.9450
## 2 Madrid 5.0675
## 3 Liverpool 2.1050
2) An alternative is to use pivot_longer
to create a long form data frame and then summarize
as before.
library(dplyr)
library(tidyr)
dat %>%
pivot_longer(cols = 1:2, values_to = "Team") %>%
mutate(Win = ifelse(name == "HomeTeam", HomeWin, AwayWin)) %>%
summarize(avgWin = mean(Win), .by = Team)
## # A tibble: 3 × 2
## Team avgWin
## <chr> <dbl>
## 1 Barcelona 3.94
## 2 Madrid 5.07
## 3 Liverpool 2.10
The input data in easily reproducible form:
dat <- data.frame(
HomeTeam = c("Barcelona", "Madrid", "Liverpool", "Madrid", "Liverpool", "Barcelona"),
AwayTeam = c("Madrid", "Liverpool", "Barcelona", "Barcelona", "Madrid", "Liverpool"),
HomeWin = c(2.3, 8.79, 3.41, 1.38, 1.53, 3.35),
Draw = c(3.28, 5.12, 3.34, 5.08, 4.07, 3.62),
AwayWin = c(3.27, 1.36, 2.21, 7.92, 6.83, 2.12)
)