rdplyrmatchmutated

How to use a match function with a mutate function?


I'm trying to embed a match (or equivalent --- I just migrated from Excel where there is liberal use of Match) function inside a mutate(). Suppose we start with a data frame df1 that looks like this, generated by the code below it:

  ID Status
1  1      N
2  2      Y
3  3      Y
4  4      N
5  5      Y

df1 <- data.frame(
  ID = c(1,2,3,4,5),
  Status = c("N","Y","Y","N","Y")
)

I use dplyr's mutate() function to generate a new column based on the value of the right-most "Status" column. Below is the new tmp data frame output and the dplyr code that generates it (this is a super simple example that applies to my larger issue -- just go with this):

  ID Status Flag
1  1      N   No
2  2      Y  Yes
3  3      Y  Yes
4  4      N   No
5  5      Y  Yes

tmp <- 
  df1 %>% 
  mutate("Flag"=case_when(Status=="Y"~"Yes",TRUE~"No"))

I'm trying to embed the equivalent of a "match" function in the above mutate(), such that the value inserted into the Flag column depends on the ID in the df1 data frame matching the ID in these 2 additional data frames:

Status1 <- data.frame(ID = c(2,3))
Status2 <- data.frame(ID = c(5)) 

So for example, with an ID matching function, my tmp output would look like this:

  ID Status Flag             [Flag explained]
1  1      N   No
2  2      Y  Yes - Status1   Since the ID in the df1 data frame matches one of the ID's in the Status 1 data frame
3  3      Y  Yes - Status1   Same as immediately above
4  4      N   No
5  5      Y  Yes - Status2   Since the ID in the df1 data frame matches the ID in the Status 2 data frame

Any ideas for matching multiple data frames in this manner in dplyr?

In the actual data I'm working with, there are the 5 data frames to match against, not 2 like in this example. Also, in my actual 5 matching tables, all ID's are mutually exclusive (no ID's repeated in the matching tables). In my actual data, the equivalent of the Status and Flag columns are also character strings, not numeric values.


Solution

  • An alternative way to do it could be like this:

    library(tidyverse)
    
    df1 %>%
      mutate("Flag" = case_when(
        ID %in% Status1$ID ~ "Status1",
        ID %in% Status2$ID ~ "Status2",
        TRUE ~ Status
      ))
    #>   ID Status    Flag
    #> 1  1      N       N
    #> 2  2      Y Status1
    #> 3  3      Y Status1
    #> 4  4      N       N
    #> 5  5      Y Status2
    

    Created on 2022-01-07 by the reprex package (v2.0.1)

    Data:

    df1 <- data.frame(
      ID = c(1, 2, 3, 4, 5),
      Status = c("N", "Y", "Y", "N", "Y")
    )
    Status1 <- data.frame(ID = c(2, 3))
    Status2 <- data.frame(ID = c(5))