In this MRE, I want to create a third group called age_group, and for all remaining values that do not meet the conditions, to be the actual age. Even if I make age a string ('age'), the age_group variable does not replace these values with "age". What's going on here?
new_df = df%>%
mutate(age_group = ifelse(
age %in% c(0:35), '0 to 35 years', ifelse(
age_options=='-66', 'Prefer not to answer', ifelse(
age_options =='-99', 'Missing', age))
)
)
data
df = structure(list(age = c(NA, NA, 38, 33, 35, 44, 33, 26, 51, 42
), age_options = c(-99, -99, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
When you test age_options == '-66'
on NA input in ifelse
, the output is NA, instead of FALSE, so it doesn't send us to the following test(s). So you could deal with NA's in the age_options variable first, e.g.
df%>%
mutate(age_group = ifelse(
age %in% c(0:35), '0 to 35 years', ifelse(
is.na(age_options), age, ifelse(
age_options=='-66', 'Prefer not to answer', ifelse(
age_options =='-99', 'Missing', age)))))
(As @rawr suggested in the comments, another option could be to use age_options %in% '-66'
instead of == '-66'
; this will evaluate FALSE with NA, avoiding the problem here.)
In any case, this logic would be clearer using case_when
, which is cleaner to read and doesn't have the same issue with propagating test NAs: case_when
gives the specified output when the test is TRUE, instead of when it is TRUE or NA. One adjustment, though, is that with if_else
or case_when
, we need to intentionally coerce the output types to be consistent, i.e. all character. See here.
df%>%
mutate(age_group = case_when(
age %in% c(0:35) ~ '0 to 35 years',
age_options=='-66' ~ 'Prefer not to answer',
age_options=='-99' ~ 'Missing',
.default = as.character(age)))
age age_options age_group
<dbl> <dbl> <chr>
1 NA -99 Missing
2 NA -99 Missing
3 38 NA 38
4 33 NA 0 to 35 years
5 35 NA 0 to 35 years
6 44 NA 44
7 33 NA 0 to 35 years
8 26 NA 0 to 35 years
9 51 NA 51
10 42 NA 42