rgsub

Remove special word using gsub


I try to clean some text and I would like to remove the following text from a string

googletag.cmd.push(function() { googletag.display('div-gpt-ad-1513202928332-3'); });

For example, if

x="123 googletag.cmd.push(function() { googletag.display('div-gpt-ad-1513202928332-3'); }); 456"

then

gsub("googletag.cmd.push(function() { googletag.display('div-gpt-ad-1513202928332-3'); });, ", x)

The desired output is [1] 123456

Thank you


Solution

  • Regex approach

    You can use the following pattern.

    x <- "123 googletag.cmd.push(function() { googletag.display('div-gpt-ad-1513202928332-3'); }); 456"
    
    gsub("^(\\d+).*?(\\d+)$", "\\1\\2", x)
    # [1] "123456"
    

    Explanation:

    enter image description here

    We keep the groups of digits at the start and end (groups 1 and 2) and discard everything in between. We use a non-greedy regex in between to ensure we capture all digits in both groups.

    Non-regex approach

    It's a little difficult to tell with one example, but if it's always the number at the beginning and the end of the string, you don't need regex. You can just split on spaces and take the first and last element:

    strsplit(x, " ", fixed = TRUE) |>
        sapply(\(m) paste0(head(m, 1), tail(m, 1)))
    # [1] "123456"