rpattern-matchingstringrgrepl

R: Partial matching values in a vector with partial match in df


I am trying to do some inventory management, I have a list of catalog numbers of materials we need and a list of managed stock onsite. I have a vector of catalog numbers (num) which are accurate to original retailers, however our internal inventory management system catalog numbers (df$Cat_no) are slightly different, I want to use the vector of catalog numbers against the inventory management system stock df to see what we are actually using and what can be discontinued.

Example

num <- c( "B12-34","456-31",  "321E", "AB987-10", "12-34-AA", "ABC123", "XYZ1234")

Cat_no <- c("A123","BAB12-34","BAB12-35","FF456-12","FF456-31","CAD321E","SSAB-987","MM12-34-AA", "MM12-35-AB","ABC123")

Item <- c("Plate","teaspoon","Desert spoon","Fork","Salad fork","napkin","lightbulb","Jug","Glass","Shaker")

df <- as.data.frame(Cat_no, Item)

There should be 6 successful hits in the df against the vector. Ive tried working with stringr and grepl but I cannot seem to get the right regex argument.


Solution

  • library(tidyverse)
    
    num <- c( "B12-34","456-31",  "321E", "AB987-10", "12-34-AA", "ABC123", "XYZ1234")
    
    df <- tibble(cat_no = c("A123","BAB12-34","BAB12-35","FF456-12","FF456-31","CAD321E","SSAB-987","MM12-34-AA", "MM12-35-AB","ABC123"), 
                 item = c("Plate","teaspoon","Desert spoon","Fork","Salad fork","napkin","lightbulb","Jug","Glass","Shaker"))
    
    df %>%
      mutate(match = str_detect(cat_no, str_c(num, collapse = "|"))) 
    
    # A tibble: 10 × 3
       cat_no     item         match
       <chr>      <chr>        <lgl>
     1 A123       Plate        FALSE
     2 BAB12-34   teaspoon     TRUE 
     3 BAB12-35   Desert spoon FALSE
     4 FF456-12   Fork         FALSE
     5 FF456-31   Salad fork   TRUE 
     6 CAD321E    napkin       TRUE 
     7 SSAB-987   lightbulb    FALSE
     8 MM12-34-AA Jug          TRUE 
     9 MM12-35-AB Glass        FALSE
    10 ABC123     Shaker       TRUE 
    
    # If you want to know the matched num 
    
    df %>%
      mutate(matched_num = str_extract(cat_no, str_c(num, collapse = "|")))
    
    # A tibble: 10 × 3
       cat_no     item         matched_num
       <chr>      <chr>        <chr>      
     1 A123       Plate        NA         
     2 BAB12-34   teaspoon     B12-34     
     3 BAB12-35   Desert spoon NA         
     4 FF456-12   Fork         NA         
     5 FF456-31   Salad fork   456-31     
     6 CAD321E    napkin       321E       
     7 SSAB-987   lightbulb    NA         
     8 MM12-34-AA Jug          12-34-AA   
     9 MM12-35-AB Glass        NA         
    10 ABC123     Shaker       ABC123