I'm having some issues with a quite basic issue. I tried to find any threads who is having the same issue but couldn't find any.
I'm trying to figure out how to generate a Bernoulli variable (y) which is based on probabilities (z) I have generated for each observation. I've generated the fictive dataset below to represent my problem.
x <- c("A", "B", "C", "D", "E", "F")
z <- c(0.11, 0.23, 0.25, 0.06, 0.1, 0.032)
df <- data.frame(x, z)
I want to add the variable y which is a binary variable based upon the probabilities from variable z.
I tried the following:
df <- df %>%
mutate(y = rbinom(1,1,z))
But it seems like it gives the same value to all observation, and not based on the observation's own probability.
Does anyone know how to solve this?
Thanks!
From the online documentation for rbinom
:
rbinom(n, size, prob)
n: number of observations. If length(n) > 1, the length is taken to be the number required.
So
df <- df %>%
mutate(y = rbinom(nrow(df), 1, z))
df
> df
x z y
1 A 0.110 0
2 B 0.230 1
3 C 0.250 0
4 D 0.060 0
5 E 0.100 0
6 F 0.032 0
To demonstrate that events are generated with the correct probabilities:
df <- data.frame(x=rep(x, each=500), z=rep(z, each=500))
df <- df %>%
mutate(y = rbinom(nrow(df), 1, z))
df %>% group_by(x) %>% summarise(y=mean(y), groups="drop")
# A tibble: 6 x 2
x y
<fct> <dbl>
1 A 0.114
2 B 0.232
3 C 0.25
4 D 0.06
5 E 0.106
6 F 0.018