I have 2 cases:
The same column can have the following cases:
1.- Having NA and values.
2.- Having only NA.
I need to make a mutate for these 2 cases in the same column, however, when I try the case of an empty column (only with NA), I get an error.
How can I handle the case of a completely empty column inside mutate, using case_when and consider the other cases?
case_1 <- tibble("a" = c("a","a", "a","a"),
"b" = c(NA,"b", NA,"b"))
case_2 <- tibble("a" = c("a","a", "a","a"),
"b" = c(NA,NA,NA,NA))
case_1 <- case_1 %>% mutate("a" = case_when(!is.na(a) ~ a,
is.na(a) ~ na.omit(unique(a))),
"b" = case_when(!is.na(b) ~ b,
is.na(b) ~ na.omit(unique(b))))
case_2 <- case_2 %>% mutate("a" = case_when(!is.na(a) ~ a,
is.na(a) ~ na.omit(unique(a))),
"b" = case_when(!is.na(b) ~ b,
is.na(b) ~ na.omit(unique(b))))
The first case (case_1) works fine, the case_2 have problems.
It looks as if you're trying to fill any non-NA
values both forwards and backwards. You can achieve what you're trying to do here with tidyr::fill()
, ensuring you specify the .direction
:
case_1 |>
tidyr::fill(b, .direction = "downup")
# a b
# <chr> <chr>
# 1 a b
# 2 a b
# 3 a b
# 4 a b
case_2 |>
tidyr::fill(b, .direction = "downup")
# a b
# <chr> <lgl>
# 1 a NA
# 2 a NA
# 3 a NA
# 4 a NA
However, in the general case that you want to use dplyr::case_when()
and all the values in b
might be NA
, you can't have na.omit(unique(b))
on the right-hand side, as it creates a zero-length vector. This is the wrong length to be assigned to a data frame column greater than length zero, and will not (cannot) be recycled into a vector of the correct length. case_when()
evaluates all the possible return vectors, even if they are never assigned. For example:
case_2 |>
mutate(
"b" = case_when(
TRUE ~ NA,
FALSE ~ na.omit(unique(b))
)
)
# Error in `mutate()`:
# ℹ In argument: `b = case_when(TRUE ~ NA, FALSE ~ na.omit(unique(b)))`.
# Caused by error:
# ! `b` must be size 4 or 1, not 0.
# Run `rlang::last_trace()` to see where the error occurred.
Clearly, FALSE
can never be TRUE
so we'll never need to assign na.omit(unique(b))
. However, case_when()
describes itself as general vectorised if-else. The dplyr::if_else()
is stricter than base::ifelse()
, with various safety checks for type and length, which are what throw this error. You would see the same error with b = if_else(TRUE, NA, na.omit(unique(b)))
, but not with b = ifelse(TRUE, NA, na.omit(unique(b)))
.
if()
also does not evaluate the FALSE
branch if the TRUE
branch is evaluated. So, you can keep your case_when()
and introduce an if()
statement in your pipe to just return NA
if all values are NA
, and never evaluate the case_when()
:
case_2 |> mutate(
"a" = case_when(
!is.na(a) ~ a,
is.na(a) ~ na.omit(unique(a))
),
"b" = if (all(is.na(b))) NA else case_when(
!is.na(b) ~ b,
is.na(b) ~ na.omit(unique(b))
)
)
# a b
# <chr> <lgl>
# 1 a NA
# 2 a NA
# 3 a NA
# 4 a NA