rbernoulli-probability

Generate a Bernoulli variable from vector with probabilities [r]


I'm having some issues with a quite basic issue. I tried to find any threads who is having the same issue but couldn't find any.

I'm trying to figure out how to generate a Bernoulli variable (y) which is based on probabilities (z) I have generated for each observation. I've generated the fictive dataset below to represent my problem.

x <- c("A", "B", "C", "D", "E", "F")
z <- c(0.11, 0.23, 0.25, 0.06, 0.1, 0.032)

df <- data.frame(x, z)

I want to add the variable y which is a binary variable based upon the probabilities from variable z.

I tried the following:

df <- df %>%
  mutate(y = rbinom(1,1,z))

But it seems like it gives the same value to all observation, and not based on the observation's own probability.

Does anyone know how to solve this?

Thanks!


Solution

  • From the online documentation for rbinom:

    rbinom(n, size, prob)
    n: number of observations. If length(n) > 1, the length is taken to be the number required.
    

    So

    df <- df %>%
      mutate(y = rbinom(nrow(df), 1, z))
    df
    > df
      x     z y
    1 A 0.110 0
    2 B 0.230 1
    3 C 0.250 0
    4 D 0.060 0
    5 E 0.100 0
    6 F 0.032 0
    

    To demonstrate that events are generated with the correct probabilities:

    df <- data.frame(x=rep(x, each=500), z=rep(z, each=500))
    df <- df %>%
      mutate(y = rbinom(nrow(df), 1, z))
    df %>% group_by(x) %>% summarise(y=mean(y), groups="drop")
    # A tibble: 6 x 2
      x         y
      <fct> <dbl>
    1 A     0.114
    2 B     0.232
    3 C     0.25 
    4 D     0.06 
    5 E     0.106
    6 F     0.018