I have the following toy data frame.
toy.df <- data.frame(Name = c("group1", "group2", "group3", "group4", "group5", "group6", "group7"),
col1 = c("pos", "neg", "NA", "pos","neg", "NA", "pos"),
col2 = c("pos", "pos", "NA", "pos","neg","NA", "neg"),
col3 = c("pos", "NA", "pos", "NA", "neg", "neg", "neg"))
I would like to mutate a new column that check the values of all columns per row. If they are all "pos" or "NA" mutate "pos", if they are all "neg" or "NA" mutate "neg" and if they are "pos" or "neg" or "NA" mutate "both".
The new column looks as follows:
col4 <- c("pos", "both", "pos", "pos","neg", "neg","both")
Here is the final data frame:
Name col1 col2 col3 col4
group1 pos pos pos pos
group2 neg pos NA both
group3 NA NA pos pos
group4 pos pos NA pos
group5 neg neg neg neg
group6 NA NA neg neg
group7 pos neg neg both
Since the "NA" in your data frame is literal "NA", we need to turn it into real missing value NA
by na_if
. Then use case_when
to supply the conditions for new column assignment. We need rowwise
for it to work in every row. The final TRUE ~ "unknown"
in case_when
captures strings other than "pos" and "neg" in col1
to col3
.
I added two entries to show the behaviour when all rows are NA
, or when there's a typo in the columns.
library(dplyr)
toy.df %>%
rowwise() %>%
mutate(across(everything(), ~na_if(.x, "NA")),
col4 = case_when(all(is.na(c_across(col1:col3))) ~ NA,
all(c_across(col1:col3) == "pos", na.rm = T) ~ "pos",
all(c_across(col1:col3) == "neg", na.rm = T) ~ "neg",
all(c_across(col1:col3) %in% c("pos", "neg", NA)) ~ "both",
TRUE ~ "unknown")) %>%
ungroup()
# A tibble: 9 × 5
Name col1 col2 col3 col4
<chr> <chr> <chr> <chr> <chr>
1 group1 pos pos pos pos
2 group2 neg pos NA both
3 group3 NA NA pos pos
4 group4 pos pos NA pos
5 group5 neg neg neg neg
6 group6 NA NA neg neg
7 group7 pos neg neg both
8 group8 NA NA NA NA
9 group9 pos pos typo unknown
toy.df <- structure(list(Name = c("group1", "group2", "group3", "group4",
"group5", "group6", "group7", "group8", "group9"), col1 = c("pos",
"neg", "NA", "pos", "neg", "NA", "pos", NA, "pos"), col2 = c("pos",
"pos", "NA", "pos", "neg", "NA", "neg", NA, "pos"), col3 = c("pos",
"NA", "pos", "NA", "neg", "neg", "neg", NA, "typo")), class = "data.frame", row.names = c(NA,
-9L))