regexstringr

Is there an R function to escape a string for regex characters


I'm wanting to build a regex expression substituting in some strings to search for, and so these string need to be escaped before I can put them in the regex, so that if the searched for string contains regex characters it still works.

Some languages have functions that will do this for you (e.g. python re.escape: https://stackoverflow.com/a/10013356/1900520). Does R have such a function?

For example (made up function):

x = "foo[bar]"
y = escape(x) # y should now be "foo\\[bar\\]"

Solution

  • I've written an R version of Perl's quotemeta function:

    library(stringr)
    quotemeta <- function(string) {
      str_replace_all(string, "(\\W)", "\\\\\\1")
    }
    

    I always use the perl flavor of regexps, so this works for me. I don't know whether it works for the "normal" regexps in R.

    Edit: I found the source explaining why this works. It's in the Quoting Metacharacters section of the perlre manpage:

    This was once used in a common idiom to disable or quote the special meanings of regular expression metacharacters in a string that you want to use for a pattern. Simply quote all non-"word" characters:

    $pattern =~ s/(\W)/\\$1/g;
    

    As you can see, the R code above is a direct translation of this same substitution (after a trip through backslash hell). The manpage also says (emphasis mine):

    Unlike some other regular expression languages, there are no backslashed symbols that aren't alphanumeric.

    which reinforces my point that this solution is only guaranteed for PCRE.