rdplyrsamplemutate

Use sample() function within mutate() and case_when()


Let's say my input dataset is given by df2:

df2 <- data.frame(a = c(1,NA,6,NA), b = c(2,4,5,1))

a b
1 2
NA 4
6 5
NA 1

I would like to create a third variable called "c" which takes the value of b if a is not missing. If a is missing (row 2 and row 4), c takes randomly the value or 0 or b.

In termes of programmation, I was thinking about doing something like that:

df2 <- df2 %>% 
  mutate(c=case_when(is.na(a) ~ sample(c(0,b),n(),replace=TRUE),
                                  TRUE ~ b))

But it doesn't give me the result I want.

Any idea?


Solution

  • The sample function won't vectorize the way you want in this case. We could use if_else instead

    df2 %>% 
      mutate(c=case_when(is.na(a) ~ if_else(runif(n()) <.5, 0,b),
                         TRUE ~ b))
    

    We use runif() to draw a random number for each row. If it's less than .5 we return 0, otherwise we return b. For example

    set.seed(369)
    df2 %>% 
      mutate(c=case_when(is.na(a) ~ if_else(runif(n()) <.5, 0, b),
                         TRUE ~ b))
    #    a b c
    # 1  1 2 2
    # 2 NA 4 0
    # 3  6 5 5
    # 4 NA 1 1