rdataframedplyr

Filter data frame in R based on column string match and another column condition


i have a data frame called df :




warning    = c("HAS","NO","HAS","HAS","HAS")
validation = c("OK","OK","WARNING","WARNING","WARNING") 
a_b_c_messages1 = c(NA,NA,"good catch",NA,NA)
D_E_f_messages2 = c(NA,NA,NA,"NOT BAD",NA)
g_h_I_messages3 = c(NA,NA,NA,NA,"BETTER")
j_k_l_messages4 = c(NA,NA,NA,NA,NA)

df = tibble(warning,validation,a_b_c_messages1,
            D_E_f_messages2,g_h_I_messages3,
            j_k_l_messages4)
 A tibble: 5 × 6
  warning validation a_b_c_messages1 D_E_f_messages2 g_h_I_messages3 j_k_l_messages4
  <chr>   <chr>      <chr>           <chr>           <chr>           <lgl>          
1 HAS     OK         NA              NA              NA              NA             
2 NO      OK         NA              NA              NA              NA             
3 HAS     WARNING    good catch      NA              NA              NA             
4 HAS     WARNING    NA              NOT BAD         NA              NA             
5 HAS     WARNING    NA              NA              BETTER          NA  

i want to filter this df and keep the rows that the column warning contain the word "HAS" and if any column that contain the string "Message" and is not NA.

ideally i wan the resulted df to be like :

  warning validation a_b_c_messages1 D_E_f_messages2 g_h_I_messages3 
  <chr>   <chr>      <chr>           <chr>           <chr>                       
3 HAS     WARNING    good catch      NA              NA                          
4 HAS     WARNING    NA              NOT BAD         NA                          
5 HAS     WARNING    NA              NA              BETTER                             

Solution

  • You can use filter with if_any and then select the columns that don't contain all NA using where (which you didn't specify in your question but is evident from your desired output).

    library(dplyr)
    
    df %>%
      filter(warning=="HAS" & if_any(contains("message"), ~!is.na(.))) %>%
      select(where(function(x) !all(is.na(x))))
    

    Gives

    # A tibble: 3 × 5
      warning validation a_b_c_messages1 D_E_f_messages2 g_h_I_messages3
      <chr>   <chr>      <chr>           <chr>           <chr>          
    1 HAS     WARNING    good catch      NA              NA             
    2 HAS     WARNING    NA              NOT BAD         NA             
    3 HAS     WARNING    NA              NA              BETTER