I have 2 tables and each of them contains different number of rows and columns.
Illustration:
Table A:
PID | Region |
---|---|
123456 | East |
123457 | South |
123458 | West |
123459 | East |
123450 | North |
Table B:
Product | Colour | Customer ID |
---|---|---|
Tech | Yellow | 987654 |
Furniture | Red | PID123456 |
Fruit | Blue | 0012345700 |
Food | Red | PID112345 |
Clothes | Green | 123458 |
Books | Black | XXX101010 |
Shoes | Blue | 123459MD |
I want to check for every PID, if they can be found in any row of customer ID (could be exact match - the exact PID is appeared in customer ID; or the PID is included as part of the string in Customer ID). If yes then show 'mapped' in the new column of Table A, if no then show 'notmapped'.
Ideal output:
PID | Region | New column |
---|---|---|
123456 | East | mapped |
123457 | South | mapped |
123458 | West | mapped |
123459 | East | mapped |
123450 | North | notmapped |
I've tried grepl
but it gives error as 2 tables contain different number of rows. str_detect
also doesn't seem to work.
Hence would like to pick your brain and seek your advice.
Thank you!!
You can use sapply()
with grepl()
:
A$mapped = sapply(A$PID, \(s) any(grepl(s, B$`Customer ID`)))
The above returns a boolean in A$mapped
. To get the labels above, you could wrap the call to any()
in an ifelse()
, like this:
A$mapped = sapply(A$PID, \(s) ifelse(any(grepl(s, B$`Customer ID`)), "mapped", "notmapped"))
Output:
PID Region mapped
1: 123456 East mapped
2: 123457 South mapped
3: 123458 West mapped
4: 123459 East mapped
5: 123450 North notmapped
You can also do this by row. If, for example, A
and B
are data.tables, then you can do this:
A[, mapped:=fifelse(any(grepl(PID, B$`Customer ID`)), "mapped", "notmapped"), PID]
(or, using dplyr approach):
A %>%
rowwise() %>%
mutate(mapped = if_else(any(grepl(PID, B$`Customer ID`)), "mapped", "notmapped"))