Consider a basketball series, best 4 out of 7. In R, we have the following function for computing the expected number of wins in such a series for a team with a certain single-game win probability wp_a
:
get_expected_wins <- function(wp_a = 0.50, num_games = 7, to_win = 4) {
# compute expected wins for team_a
# wp_a: a team's odds to win a single game
# num_games: the maximum number of possible games remaining in the series
# to_win: how many more games a team needs to win the series
# 7,4 correspond to winning a best 4 out of 7 series
# expected wins for the team
prob_to_win_n_games <- dbinom(x = 0:num_games, size = num_games, prob = wp_a)
num_wins <- c(0:to_win, rep(to_win, num_games - to_win))
ewins <- sum(prob_to_win_games_a * num_wins)
# and return
return(ewins)
}
In the function, prob_to_win_n_games
should be the team's probability of winning 0, 1, 2, up to num_games
number of games. Consider a playoff series where a team is trailing 0-3, and we are trying to compute their expected remaining number of wins in the series. Keep in mind that 1 more loss by the team would end the series. We want to call get_expected_wins(0.5, 4, 4)
In this series, this team has a 50%
chance of winning 0 more games (lose the next game), 25%
to win 1 game (win, then lose), 12.5%
to win 2 games (win, win, then lose), 6.25%
to win 3 games (win, win, win, then lose) and 6.25%
to win 4 games (win 4x). Their expected wins in the series is then 0.5*0 + 0.25*1 + 0.125*2 + 0.0625*3 + 0.0625*4 = .9375
In this example, num_games = 4
and to_win = 4
, and prob_to_win_n_games
is incorrectly computed as 0.0625 0.2500 0.3750 0.2500 0.0625
. The binomial fails to account for the series ending after an additional loss. It computes a 25%
chance of 3 wins, based on the calculation (4 choose 3) * (0.5 ^ 4)
, however 3 of the 4 possible sequences (L W W W
, W L W W
, W W L W
) are not possible in our theoretical playoff series where one additional loss by the team would end the series. Only W W W L
gets the team to 3 wins.
How can we update this function to correctly compute a team's probability of winning a certain number of games, given the parameters we set for the playoff series.
If the number of wins is less than to_win
, you have to subtract 1 from the top number in the binomial coefficient (first argument of choose
) from what dbinom
would give.
The reason for this is that the only way to lose a series is to lose the final game of the series. There is no other restriction on the ordering of the wins/loses for the loser. This means the wins for the series loser can be distributed among all but the last game, which is why we must subtract one from the top number in the binomial coefficient.
This will return the probability of seeing 0:to_win
wins:
get_expected_wins <- function(wp_a = 0.50, num_games = 7, to_win = 4) {
i <- to_win:1
wins <- choose(num_games - i, to_win - i)*wp_a^(to_win - i)*(1 - wp_a)^(num_games - to_win + 1)
setNames(c(wins, 1 - sum(wins)), 0:to_win)
}
get_expected_wins(0.5, 7, 4)
#> 0 1 2 3 4
#> 0.06250 0.12500 0.15625 0.15625 0.50000
get_expected_wins(0.5, 6, 3)
#> 0 1 2 3
#> 0.06250 0.12500 0.15625 0.65625
get_expected_wins(0.5, 4, 4)
#> 0 1 2 3 4
#> 0.5000 0.2500 0.1250 0.0625 0.0625
Alternatively,
get_expected_wins <- function(wp_a = 0.50, num_games = 7L, to_win = 4L) {
k <- 0:(to_win - 1L)
n <- (num_games - to_win + 1L):num_games
wins <- dbinom(k, n, wp_a)*(n - k)/n
setNames(c(wins, 1 - sum(wins)), 0:to_win)
}