rcasenamutate

Use mutate and case_when in empty columns


I have 2 cases:

The same column can have the following cases:

1.- Having NA and values.

2.- Having only NA.

I need to make a mutate for these 2 cases in the same column, however, when I try the case of an empty column (only with NA), I get an error.

How can I handle the case of a completely empty column inside mutate, using case_when and consider the other cases?

    case_1 <- tibble("a" = c("a","a", "a","a"),
                     "b" = c(NA,"b", NA,"b"))

    case_2 <- tibble("a" = c("a","a", "a","a"),
                     "b" = c(NA,NA,NA,NA))

    case_1 <- case_1 %>% mutate("a" = case_when(!is.na(a) ~ a,
                                                is.na(a) ~ na.omit(unique(a))),
                                "b" = case_when(!is.na(b) ~ b,
                                                is.na(b) ~ na.omit(unique(b))))

    case_2 <- case_2 %>% mutate("a" = case_when(!is.na(a) ~ a,
                                                is.na(a) ~ na.omit(unique(a))),
                                "b" = case_when(!is.na(b) ~ b,
                                                is.na(b) ~ na.omit(unique(b))))

The first case (case_1) works fine, the case_2 have problems.


Solution

  • It looks as if you're trying to fill any non-NA values both forwards and backwards. You can achieve what you're trying to do here with tidyr::fill(), ensuring you specify the .direction:

    case_1 |>
        tidyr::fill(b, .direction = "downup")
    
    #   a     b
    #   <chr> <chr>
    # 1 a     b
    # 2 a     b
    # 3 a     b
    # 4 a     b
    
    case_2 |>
        tidyr::fill(b, .direction = "downup")
    
    #   a     b
    #   <chr> <lgl>
    # 1 a     NA
    # 2 a     NA
    # 3 a     NA
    # 4 a     NA
    

    However, in the general case that you want to use dplyr::case_when() and all the values in b might be NA, you can't have na.omit(unique(b)) on the right-hand side, as it creates a zero-length vector. This is the wrong length to be assigned to a data frame column greater than length zero, and will not (cannot) be recycled into a vector of the correct length. case_when() evaluates all the possible return vectors, even if they are never assigned. For example:

    case_2 |>
        mutate(
            "b" = case_when(
                TRUE ~ NA,
                FALSE ~ na.omit(unique(b))
            )
        )
    # Error in `mutate()`:
    # ℹ In argument: `b = case_when(TRUE ~ NA, FALSE ~ na.omit(unique(b)))`.
    # Caused by error:
    # ! `b` must be size 4 or 1, not 0.
    # Run `rlang::last_trace()` to see where the error occurred.
    

    Clearly, FALSE can never be TRUE so we'll never need to assign na.omit(unique(b)). However, case_when() describes itself as general vectorised if-else. The dplyr::if_else() is stricter than base::ifelse(), with various safety checks for type and length, which are what throw this error. You would see the same error with b = if_else(TRUE, NA, na.omit(unique(b))), but not with b = ifelse(TRUE, NA, na.omit(unique(b))).

    if() also does not evaluate the FALSE branch if the TRUE branch is evaluated. So, you can keep your case_when() and introduce an if() statement in your pipe to just return NA if all values are NA, and never evaluate the case_when():

    case_2 |> mutate(
        "a" = case_when(
            !is.na(a) ~ a,
            is.na(a) ~ na.omit(unique(a))
        ),
        "b" = if (all(is.na(b))) NA else case_when(
                !is.na(b) ~ b,
                is.na(b) ~ na.omit(unique(b))
            )
    )
    
    #   a     b
    #   <chr> <lgl>
    # 1 a     NA
    # 2 a     NA
    # 3 a     NA
    # 4 a     NA