library(tidyverse)
set.seed(1)
start <- mdy("01/01/2022")
end <- start + as.difftime(4, units = "days")
days <- seq(from = start, to = end, by = 1)
days <- sample(days, 100, replace = T)
flip <- sample(c("Heads", "Tails"), 100, replace = TRUE)
numbers <- rchisq(100, 30)
df <- tibble(days, numbers, flip)
I have this dataframe and would like to summarize it for each day by taking the total sum of numbers for each flip and creating this ratio, (Heads-Tails)/(Heads+Tails) below is an example for a particular day:
df <- df %>% group_by(days, flip) %>%
summarize(total = sum(numbers)) %>%
ungroup()
> (df[[1,3]]-df[[2,3]])/(df[[1,3]]+df[[2,3]])
[1] -0.4164241
> (df[[3,3]]-df[[4,3]])/(df[[3,3]]+df[[4,3]])
[1] 0.06134484
> (df[[5,3]]-df[[6,3]])/(df[[5,3]]+df[[6,3]])
[1] 0.6067984
> (df[[7,3]]-df[[8,3]])/(df[[7,3]]+df[[8,3]])
[1] -0.2603271
> (df[[9,3]]-df[[10,3]])/(df[[9,3]]+df[[10,3]])
[1] 0.309745
These are the ratios I would like to end up with for every day.
Starting from the first df
,
df %>%
group_by(days, flip) %>%
summarize(total = sum(numbers)) %>%
group_by(days) %>%
summarize(z = -diff(total)/sum(total))
# # A tibble: 5 × 2
# days z
# <date> <dbl>
# 1 2022-01-01 -0.416
# 2 2022-01-02 0.0613
# 3 2022-01-03 0.607
# 4 2022-01-04 -0.260
# 5 2022-01-05 0.310
though this relies on the order of flip
. To be a little more resilient to that, we can do:
... %>%
summarize(z = (total[flip == "Heads"] - total[flip == "Tails"]) / sum(total))
We can also try a pivoting approach:
df %>%
group_by(days, flip) %>%
summarize(total = sum(numbers)) %>%
ungroup() %>%
pivot_wider(id_cols = days, names_from = flip, values_from = total) %>%
mutate(z = (Heads - Tails) / (Heads + Tails))
# # A tibble: 5 × 4
# days Heads Tails z
# <date> <dbl> <dbl> <dbl>
# 1 2022-01-01 205. 498. -0.416
# 2 2022-01-02 315. 279. 0.0613
# 3 2022-01-03 363. 88.8 0.607
# 4 2022-01-04 227. 386. -0.260
# 5 2022-01-05 403. 212. 0.310