rtidyverse

Drop a column if all of its values are NA


I've got a database query running from R which can have multiple columns filled with NA, some of these columns I need further down the line but one specific column can be dropped if all values are NA.

I usually use purrr::discard(~all(is.na(.))) to drop the columns which are all NA, but since this dataframe can contain multiple columns with NA where I only want to remove one I'm struggling to make this specific to the column in a tidyverse solution.

I've currently got this workaround:

  if(sum(is.na(Orders$Originator)) == nrow(Orders)) {
    
    Orders <- Orders %>%
      select(-Originator)
    
  }

But it would improve the readability if I can have this in a tidyverse solution. Hope someone can be of help!

Thanks!


Solution

  • Using example data:

    df <- data.frame(
      x = c(1,2,NA),
      y = NA,
      z = c(3,4,5)
    )
    

    Here column y is the target column to check if all is.na. Your if and else will be contained in curly braces. The braces will suppress the pipe from using the first argument in a function. Note the else will keep your data frame in the pipe if the condition is false.

    library(tidyverse)
    
    df %>%
      { if (all(is.na(.$y))) select(., -y) else . }
    

    Output

       x z
    1  1 3
    2  2 4
    3 NA 5