I try to clean some text and I would like to remove the following text from a string
googletag.cmd.push(function() { googletag.display('div-gpt-ad-1513202928332-3'); });
For example, if
x="123 googletag.cmd.push(function() { googletag.display('div-gpt-ad-1513202928332-3'); }); 456"
then
gsub("googletag.cmd.push(function() { googletag.display('div-gpt-ad-1513202928332-3'); });, ", x)
The desired output is [1] 123456
Thank you
You can use the following pattern.
x <- "123 googletag.cmd.push(function() { googletag.display('div-gpt-ad-1513202928332-3'); }); 456"
gsub("^(\\d+).*?(\\d+)$", "\\1\\2", x)
# [1] "123456"
We keep the groups of digits at the start and end (groups 1 and 2) and discard everything in between. We use a non-greedy regex in between to ensure we capture all digits in both groups.
It's a little difficult to tell with one example, but if it's always the number at the beginning and the end of the string, you don't need regex. You can just split on spaces and take the first and last element:
strsplit(x, " ", fixed = TRUE) |>
sapply(\(m) paste0(head(m, 1), tail(m, 1)))
# [1] "123456"