I have a series of string in a vector and need to remove the matching starting pattern from the string. However, I don't know the pattern or how long it is.
stringa <- c("apple_tart", "apple_pie", "apple_fritter")
stringb <- c("baby breath", "baby oil", "baby doll", "baby name")
I would like the results to be. I need a function or method that will work for both a and b
resultsa <- c("tart", "pie", "fritter")
resultsb <- c("breath", "oil", "doll", "name")
I know I could do this with str_remove if I knew the pattern or how long the matching pattern was. Is there a way to do this? Perhaps first find the starting string pattern to then use str_remove?
Use Recursion:
remove_common <- function(x){
a <- unique(substr(x, 1, 1))
if(length(a) > 1) return(x)
Recall(substr(x, 2, 100000L))
}
remove_common(stringa)
[1] "tart" "pie" "fritter"
remove_common(stringb)
[1] "breath" "oil" "doll" "name"
Another base R option:
fn <- function(x){
n <- length(x) - 1
y <- paste0(x, collapse = " ")
pat <- regmatches(y, regexec(sprintf("(.*)(?:.*?\\1){%d}\\K", n),y, perl = TRUE))
sub(unlist(pat)[2], "", x)
}
fn(stringa)
[1] "tart" "pie" "fritter"
fn(stringb)
[1] "breath" "oil" "doll" "name"
Another way:
fn <- function(x){
f <- function(x, y){
n <- seq(min(length(x), length(y)))
e <- cumsum(x[n] != y[n])
x[e == e[1]]
}
v <- paste0(Reduce(f, strsplit(x, "")), collapse = "")
sub(v, "", x)
}
fn(stringa)
[1] "tart" "pie" "fritter"
fn(stringb)
[1] "breath" "oil" "doll" "name"