rregexgrepl

How do I exclude certain strings when using grepl?


I have a dataframe like this:

df <- data.frame(
  Food = c("Apple", "Banana", "Carrot", "Donut", "Eclair", "Flour"),
  Ingredient = c("salt", "sodium chloride", "salt replacer", "unsalted", "veg salt", "vegetable salt")
)

I want to use grepl to create a variable that shows TRUE when "salt" or "sodium chloride" are present but FALSE for other values "salt replacer", "unsalted", veg salt", "vegetable salt".

The output should be a dataframe that looks like this:

Food Ingredient Salt_Present
Apple salt TRUE
Banana sodium chloride TRUE
Carrot salt replacer FALSE
Donut unsalted FALSE
Eclair veg salt FALSE
Flour vegetable salt FALSE

I am having difficulty writing the regex to achieve this.

How can I write a regex that will return true for Apple and Banana, but false for the other cases in the data?


Solution

  • Try this:

    library(tidyverse)
    
    df <- data.frame(
      Food = c("Apple", "Banana", "Carrot", "Donut", "Eclair", "Flour"),
      Ingredient = c("salt", "sodium chloride", "salt replacer", "unsalted", "veg salt", "vegetable salt")
    )
    
    df %>% mutate(
      Salt_Present = grepl("^salt$|^sodium chloride$",Ingredient)
    )
    

    ^ and $ ensure that there are no partial matches.