Can the mutate be used when the mutation is conditional (depending on the values of certain column values)?
This example helps showing what I mean.
structure(list(a = c(1, 3, 4, 6, 3, 2, 5, 1), b = c(1, 3, 4,
2, 6, 7, 2, 6), c = c(6, 3, 6, 5, 3, 6, 5, 3), d = c(6, 2, 4,
5, 3, 7, 2, 6), e = c(1, 2, 4, 5, 6, 7, 6, 3), f = c(2, 3, 4,
2, 2, 7, 5, 2)), .Names = c("a", "b", "c", "d", "e", "f"), row.names = c(NA,
8L), class = "data.frame")
a b c d e f
1 1 1 6 6 1 2
2 3 3 3 2 2 3
3 4 4 6 4 4 4
4 6 2 5 5 5 2
5 3 6 3 3 6 2
6 2 7 6 7 7 7
7 5 2 5 2 6 5
8 1 6 3 6 3 2
I was hoping to find a solution to my problem using the dplyr package (and yes I know this not code that should work, but I guess it makes the purpose clear) for creating a new column g:
library(dplyr)
df <- mutate(df,
if (a == 2 | a == 5 | a == 7 | (a == 1 & b == 4)){g = 2},
if (a == 0 | a == 1 | a == 4 | a == 3 | c == 4) {g = 3})
The result of the code I am looking for should have this result in this particular example:
a b c d e f g
1 1 1 6 6 1 2 3
2 3 3 3 2 2 3 3
3 4 4 6 4 4 4 3
4 6 2 5 5 5 2 NA
5 3 6 3 3 6 2 NA
6 2 7 6 7 7 7 2
7 5 2 5 2 6 5 2
8 1 6 3 6 3 2 3
Does anyone have an idea about how to do this in dplyr? This data frame is just an example, the data frames I am dealing with are much larger. Because of its speed I tried to use dplyr, but perhaps there are other, better ways to handle this problem?
Use ifelse
df %>%
mutate(g = ifelse(a == 2 | a == 5 | a == 7 | (a == 1 & b == 4), 2,
ifelse(a == 0 | a == 1 | a == 4 | a == 3 | c == 4, 3, NA)))
Added - if_else: Note that in dplyr 0.5 there is an if_else
function defined so an alternative would be to replace ifelse
with if_else
; however, note that since if_else
is stricter than ifelse
(both legs of the condition must have the same type) so the NA
in that case would have to be replaced with NA_real_
.
df %>%
mutate(g = if_else(a == 2 | a == 5 | a == 7 | (a == 1 & b == 4), 2,
if_else(a == 0 | a == 1 | a == 4 | a == 3 | c == 4, 3, NA_real_)))
Added - case_when Since this question was posted dplyr has added case_when
so another alternative would be:
df %>% mutate(g = case_when(a == 2 | a == 5 | a == 7 | (a == 1 & b == 4) ~ 2,
a == 0 | a == 1 | a == 4 | a == 3 | c == 4 ~ 3,
TRUE ~ NA_real_))
Added - arithmetic/na_if If the values are numeric and the conditions (except for the default value of NA at the end) are mutually exclusive, as is the case in the question, then we can use an arithmetic expression such that each term is multiplied by the desired result using na_if
at the end to replace 0 with NA.
df %>%
mutate(g = 2 * (a == 2 | a == 5 | a == 7 | (a == 1 & b == 4)) +
3 * (a == 0 | a == 1 | a == 4 | a == 3 | c == 4),
g = na_if(g, 0))