I have a dataframe in R with results of sports matches. Each row represents one match between two players with columns for the date of the match, the player names, and the scores of each player.
event.date | home.name | away.name | home.score | away.score |
---|---|---|---|---|
2024-06-15 | A | B | 7 | 5 |
2024-06-14 | A | C | 7 | 2 |
2024-06-13 | B | C | 3 | 7 |
df <- tibble(
event.date=c('2024-06-15','2024-06-14','2024-06-13'),
home.name=c('A','B','C'),
away.name=c('B','C','C'),
home.score=c(7,7,3),
away.score=c(5,2,7)
)
I want to turn this "summarized" table into a table listing out the total score for each player in each match by turning the home.score and away.score columns into binary responses. For example, for the match between A and B on 2024-06-15, there should be 12 rows total: 7 with home.score = 1 and away.score = 0, and 5 with home.score = 0 and away.score = 1.
event.date | home.name | away.name | home.score | away.score |
---|---|---|---|---|
2024-06-15 | A | B | 1 | 0 |
2024-06-15 | A | B | 1 | 0 |
2024-06-15 | A | B | 1 | 0 |
2024-06-15 | A | B | 1 | 0 |
2024-06-15 | A | B | 1 | 0 |
2024-06-15 | A | B | 1 | 0 |
2024-06-15 | A | B | 1 | 0 |
2024-06-15 | A | B | 0 | 1 |
2024-06-15 | A | B | 0 | 1 |
2024-06-15 | A | B | 0 | 1 |
2024-06-15 | A | B | 0 | 1 |
2024-06-15 | A | B | 0 | 1 |
For the example above, the final table generated from df
should have 31 rows - one for each "point" in the scores of all the matches.
I thought I could use tidyr::uncount()
to increase the rows of my table by the values of home.score
and away.score
, but the weights
argument only takes one column as the parameter - I can't use it across both home.score
and away.score
.
> uncount(df,weights=home.score)
# A tibble: 17 × 4
event.date home.name away.name away.score
<chr> <chr> <chr> <dbl>
1 2024-06-15 A B 5
2 2024-06-15 A B 5
3 2024-06-15 A B 5
4 2024-06-15 A B 5
5 2024-06-15 A B 5
6 2024-06-15 A B 5
7 2024-06-15 A B 5
8 2024-06-14 B C 2
9 2024-06-14 B C 2
10 2024-06-14 B C 2
11 2024-06-14 B C 2
12 2024-06-14 B C 2
13 2024-06-14 B C 2
14 2024-06-14 B C 2
15 2024-06-13 C C 7
16 2024-06-13 C C 7
17 2024-06-13 C C 7
reframe
seems like a good candidate to do this. My first attempt was:
df |> reframe(
home.score = rep(c(1, 0), c(home.score, away.score)),
away.score = rep(c(0, 1), c(home.score, away.score)),
.by = c(event.date, home.name, away.name)
)
Which I think is nice and readable code. But unfortunately, the home.score
value has already been modified when we want to calculate the away.score
, so we have to use this more confusing version, where we use the sum of the home score (i.e. the sum of the vector of 0s and 1s):
df |> reframe(
home.score = rep(c(1, 0), c(home.score, away.score)),
away.score = rep(c(0, 1), c(sum(home.score), away.score)),
.by = c(event.date, home.name, away.name)
)
Which gives:
# A tibble: 31 × 5 event.date home.name away.name home.score away.score <chr> <chr> <chr> <dbl> <dbl> 1 2024-06-15 A B 1 0 2 2024-06-15 A B 1 0 3 2024-06-15 A B 1 0 4 2024-06-15 A B 1 0 5 2024-06-15 A B 1 0 6 2024-06-15 A B 1 0 7 2024-06-15 A B 1 0 8 2024-06-15 A B 0 1 9 2024-06-15 A B 0 1 10 2024-06-15 A B 0 1 # ℹ 21 more rows # ℹ Use `print(n = ...)` to see more rows