rarraysdplyrgrepl

R: Dplyr: How to Check if the Value of One Variable is Contained in Another


I have hundreds of records with "state_name" (Alaska, Alabama etc.) and need to determine whether the value of state_name is contained anywhere in another variable "jurisdiction_name". I know how to search a string for a SINGLE value e.g. "Alabama" using something like:

mutate(type_state=ifelse(grepl("Alabama",jurisd_name),1,0)) %>% 

How can I search each row to determine whether the state name (differing on each row) is contained in the jurisdiction name? In other words, I am searching for the changing VALUE of state_name, not a single state.

Is there a way to do something like:

df2 <- df %>%
  mutate(state_val=get(state_name))%>%
  mutate(type_state=ifelse(grepl(state_val,jurisd_name),1,0))

Obviously, this code doesn't work because grepl is expecting a string pattern e.g. grepl("Alabama",jurisdiction_name)

However, I don't know how to search for a VALUE that changes on each row of data.


Solution

  • If I understand correctly your issue, here is a solution that should easily be adapted to your case:

    df <- tibble::tibble(a = month.name, b = c(letters[1:6], letters[1:6]))
    
    df |> 
      dplyr::mutate(check = stringr::str_detect(string = a, pattern = b))
    #> # A tibble: 12 × 3
    #>    a         b     check
    #>    <chr>     <chr> <lgl>
    #>  1 January   a     TRUE 
    #>  2 February  b     TRUE 
    #>  3 March     c     TRUE 
    #>  4 April     d     FALSE
    #>  5 May       e     FALSE
    #>  6 June      f     FALSE
    #>  7 July      a     FALSE
    #>  8 August    b     FALSE
    #>  9 September c     FALSE
    #> 10 October   d     FALSE
    #> 11 November  e     TRUE 
    #> 12 December  f     FALSE
    

    Created on 2023-05-14 with reprex v2.0.2

    Basically, if I understood correctly what you are trying to achieve, you'd probably just need to replace a with state_val and b with 'jurisd_name`.

    If you want to use grepl, you can do so by grouping, and inverting the order of the parameters:

    df |> 
      dplyr::group_by(a, b) |> 
      dplyr::mutate(check = grepl(b, a)) |> 
      dplyr::ungroup()