rregexgrepl

using regex and grepl to detect words starting with a specific pattern


I do not understand why grepl("see*", "file SEC", ignore.case = TRUE) returns TRUE?

I am trying to find all words that start with see, such as See, seeing, seen, etc. and remove them. The string above "file SEC" does not have such a word, yet TRUE is returned.


Solution

  • Use a word boundary (\\b)

    The pattern "see*" checks for "se" followed by any number of "e"s (e*)(including zero), so "SE" matches.

    I believe you may want to look into something like this, without the "*"

    grepl("^see", "file SEC", ignore.case = TRUE)
    
    FALSE
    

    In addition to the "^" sign, you can also include a word boundary \\b, so you can detect words that start with the pattern, but exclude those that do not, inside multi-word characters:

    grepl("\\bSee", c("file SEC", "See", "seeing", "seen", "he was seen", "He did not forsee the event"), ignore.case = TRUE)
    [1] FALSE  TRUE  TRUE  TRUE  TRUE  FALSE