rdataframegrepl

Use grepl() to match two elements of a string with separators with two column values in a dataframe simultaneously


I have a dataframe as shown below and the desired output is shown below.


df <- data.frame(
  col1 = c("abc_1_102", "abc_1_103", "xyz_1_104")
)


selection <- data.frame(col1 =c("abc", "xyz"),col2 =c("102", "106"))

Desired output

       col1      col9
1 abc_1_102 SELECT
2 abc_1_103 NOTSELECT
3 xyz_1_104 NOTSELECT

How can we achieve this using grepl() function in R

What I have tried?

df$col2 <- ifelse(grepl(paste("^", selection$col1, "$", collapse = "|"), df$col1)&
                          grepl(paste("^", selection$col2, "$", collapse = "|"), df$col1),
                        "SELECT", "NOTSELECT")


print(df)

       col1      col2
1 abc_1_102 NOTSELECT
2 abc_1_103 NOTSELECT
3 xyz_1_104 NOTSELECT

Here, the result is incorrect as in the selection the values in column1 and column2 match the first and third element of value in row 1 of df.


Solution

  • A more complete example would be preferable, but from what I understand, you want to see if there are any of the rows of selection (with a separator of _1_) in df, with the column being "SELECT" when TRUE and "NOTSELECT" otherwise.

    df$col9 <- ifelse(grepl(paste(selection$col1, selection$col2, sep = ".*", collapse = "|"), df$col1), "SELECT", "NOTSELECT")
    

    Notes:

    1. If the separator is important to you, then you can replace .* with _1_
    2. Instead of "SELECT" and "NOTSELECT", use TRUE and FALSE. It's shorter, simpler, harder to mess up, and all round a better way of doing things