rregexstringgsub

Match several substitution patterns with a single expression


I know from the R documentation that the gsub function can be programmed to do simultaneous operations like upper/lowercase substitution:

gsub("([a-z]*)([A-Z]*)", "\\U\\1\\L\\2", "upper LOWER", perl=TRUE)

[1] "UPPER lower"

I have a minimal example, a string "STTS", in which I want to replace "S" with "TRUE" and "T" with "FALSE". I cannot do it sequentially because the matching patterns would obviously collide.

I have tried this code:

gsub("([S]*)\\1([T]*)\\2", "TRUE \\1FALSE \\2", "STTS",perl=TRUE)

and received [1] "TRUE FALSE STRUE FALSE TSTRUE FALSE "

instead of "TRUE FALSE FALSE TRUE"


Solution

  • The original gsub base R function does not support conditional replacement.

    You can use stringr::str_replace_all to apply a conditional replacement here:

    library(stringr)
    trimws(str_replace_all("STTS", "[ST]", function(x) ifelse(x=="S", "TRUE ", "FALSE ")))
    ## => [1] "TRUE FALSE FALSE TRUE"
    

    Using R version 4.3.1 (2023-06-16 ucrt) here.

    So, the [ST] pattern finds either S or T and the letter is replaced based on the match with the replacement function.

    See the online R demo. Since we add a space after each TRUE or FALSE, I added the trimws() function.

    In case you want to use the gsubfn package, you can use

    library(gsubfn)
    trimws(gsubfn("[ST]", ~ ifelse(x=="S", "TRUE ", "FALSE "), "STTS"))
    ## => [1] "TRUE FALSE FALSE TRUE"
    

    It does the same thing described above.