I'm trying to embed a match (or equivalent --- I just migrated from Excel where there is liberal use of Match) function inside a mutate()
. Suppose we start with a data frame df1
that looks like this, generated by the code below it:
ID Status
1 1 N
2 2 Y
3 3 Y
4 4 N
5 5 Y
df1 <- data.frame(
ID = c(1,2,3,4,5),
Status = c("N","Y","Y","N","Y")
)
I use dplyr's mutate()
function to generate a new column based on the value of the right-most "Status" column. Below is the new tmp
data frame output and the dplyr code that generates it (this is a super simple example that applies to my larger issue -- just go with this):
ID Status Flag
1 1 N No
2 2 Y Yes
3 3 Y Yes
4 4 N No
5 5 Y Yes
tmp <-
df1 %>%
mutate("Flag"=case_when(Status=="Y"~"Yes",TRUE~"No"))
I'm trying to embed the equivalent of a "match" function in the above mutate()
, such that the value inserted into the Flag column depends on the ID in the df1
data frame matching the ID in these 2 additional data frames:
Status1 <- data.frame(ID = c(2,3))
Status2 <- data.frame(ID = c(5))
So for example, with an ID matching function, my tmp
output would look like this:
ID Status Flag [Flag explained]
1 1 N No
2 2 Y Yes - Status1 Since the ID in the df1 data frame matches one of the ID's in the Status 1 data frame
3 3 Y Yes - Status1 Same as immediately above
4 4 N No
5 5 Y Yes - Status2 Since the ID in the df1 data frame matches the ID in the Status 2 data frame
Any ideas for matching multiple data frames in this manner in dplyr?
In the actual data I'm working with, there are the 5 data frames to match against, not 2 like in this example. Also, in my actual 5 matching tables, all ID's are mutually exclusive (no ID's repeated in the matching tables). In my actual data, the equivalent of the Status and Flag columns are also character strings, not numeric values.
An alternative way to do it could be like this:
library(tidyverse)
df1 %>%
mutate("Flag" = case_when(
ID %in% Status1$ID ~ "Status1",
ID %in% Status2$ID ~ "Status2",
TRUE ~ Status
))
#> ID Status Flag
#> 1 1 N N
#> 2 2 Y Status1
#> 3 3 Y Status1
#> 4 4 N N
#> 5 5 Y Status2
Created on 2022-01-07 by the reprex package (v2.0.1)
Data:
df1 <- data.frame(
ID = c(1, 2, 3, 4, 5),
Status = c("N", "Y", "Y", "N", "Y")
)
Status1 <- data.frame(ID = c(2, 3))
Status2 <- data.frame(ID = c(5))