Let's say my input dataset is given by df2:
df2 <- data.frame(a = c(1,NA,6,NA), b = c(2,4,5,1))
a | b |
---|---|
1 | 2 |
NA | 4 |
6 | 5 |
NA | 1 |
I would like to create a third variable called "c" which takes the value of b if a is not missing. If a is missing (row 2 and row 4), c takes randomly the value or 0 or b.
In termes of programmation, I was thinking about doing something like that:
df2 <- df2 %>%
mutate(c=case_when(is.na(a) ~ sample(c(0,b),n(),replace=TRUE),
TRUE ~ b))
But it doesn't give me the result I want.
Any idea?
The sample
function won't vectorize the way you want in this case. We could use if_else
instead
df2 %>%
mutate(c=case_when(is.na(a) ~ if_else(runif(n()) <.5, 0,b),
TRUE ~ b))
We use runif()
to draw a random number for each row. If it's less than .5 we return 0, otherwise we return b. For example
set.seed(369)
df2 %>%
mutate(c=case_when(is.na(a) ~ if_else(runif(n()) <.5, 0, b),
TRUE ~ b))
# a b c
# 1 1 2 2
# 2 NA 4 0
# 3 6 5 5
# 4 NA 1 1