rstringmatchdetect

How to check if the string in one column is included in another column but 2 columns contain different rows?


I have 2 tables and each of them contains different number of rows and columns.

Illustration:

Table A:

PID Region
123456 East
123457 South
123458 West
123459 East
123450 North

Table B:

Product Colour Customer ID
Tech Yellow 987654
Furniture Red PID123456
Fruit Blue 0012345700
Food Red PID112345
Clothes Green 123458
Books Black XXX101010
Shoes Blue 123459MD

I want to check for every PID, if they can be found in any row of customer ID (could be exact match - the exact PID is appeared in customer ID; or the PID is included as part of the string in Customer ID). If yes then show 'mapped' in the new column of Table A, if no then show 'notmapped'.

Ideal output:

PID Region New column
123456 East mapped
123457 South mapped
123458 West mapped
123459 East mapped
123450 North notmapped

I've tried grepl but it gives error as 2 tables contain different number of rows. str_detect also doesn't seem to work.

Hence would like to pick your brain and seek your advice.

Thank you!!


Solution

  • You can use sapply() with grepl():

    A$mapped = sapply(A$PID, \(s) any(grepl(s, B$`Customer ID`)))
    

    The above returns a boolean in A$mapped. To get the labels above, you could wrap the call to any() in an ifelse(), like this:

    A$mapped = sapply(A$PID, \(s) ifelse(any(grepl(s, B$`Customer ID`)), "mapped", "notmapped"))
    

    Output:

          PID Region    mapped
    1: 123456   East    mapped
    2: 123457  South    mapped
    3: 123458   West    mapped
    4: 123459   East    mapped
    5: 123450  North notmapped
    

    You can also do this by row. If, for example, A and B are data.tables, then you can do this:

    A[, mapped:=fifelse(any(grepl(PID, B$`Customer ID`)), "mapped", "notmapped"), PID]
    

    (or, using dplyr approach):

    A %>% 
      rowwise() %>% 
      mutate(mapped = if_else(any(grepl(PID, B$`Customer ID`)), "mapped", "notmapped"))