i have a data frame called df :
warning = c("HAS","NO","HAS","HAS","HAS")
validation = c("OK","OK","WARNING","WARNING","WARNING")
a_b_c_messages1 = c(NA,NA,"good catch",NA,NA)
D_E_f_messages2 = c(NA,NA,NA,"NOT BAD",NA)
g_h_I_messages3 = c(NA,NA,NA,NA,"BETTER")
j_k_l_messages4 = c(NA,NA,NA,NA,NA)
df = tibble(warning,validation,a_b_c_messages1,
D_E_f_messages2,g_h_I_messages3,
j_k_l_messages4)
A tibble: 5 × 6
warning validation a_b_c_messages1 D_E_f_messages2 g_h_I_messages3 j_k_l_messages4
<chr> <chr> <chr> <chr> <chr> <lgl>
1 HAS OK NA NA NA NA
2 NO OK NA NA NA NA
3 HAS WARNING good catch NA NA NA
4 HAS WARNING NA NOT BAD NA NA
5 HAS WARNING NA NA BETTER NA
i want to filter this df and keep the rows that the column warning contain the word "HAS" and if any column that contain the string "Message" and is not NA.
ideally i wan the resulted df to be like :
warning validation a_b_c_messages1 D_E_f_messages2 g_h_I_messages3
<chr> <chr> <chr> <chr> <chr>
3 HAS WARNING good catch NA NA
4 HAS WARNING NA NOT BAD NA
5 HAS WARNING NA NA BETTER
You can use filter
with if_any
and then select the columns that don't contain all NA using where
(which you didn't specify in your question but is evident from your desired output).
library(dplyr)
df %>%
filter(warning=="HAS" & if_any(contains("message"), ~!is.na(.))) %>%
select(where(function(x) !all(is.na(x))))
Gives
# A tibble: 3 × 5
warning validation a_b_c_messages1 D_E_f_messages2 g_h_I_messages3
<chr> <chr> <chr> <chr> <chr>
1 HAS WARNING good catch NA NA
2 HAS WARNING NA NOT BAD NA
3 HAS WARNING NA NA BETTER