I apologize in advance for the naivety of my question, I did not find a way to do this in R. I have several character strings that look like this :
sample.names1<-c("S0938-CR1","S0957-AB8","S0971-EGFP1-10")
I would like to remove the characters that appear before the first "-", in order to keep only CR1, AB8 and EGFP1-10
I tried
sample.names <- sapply(strsplit(basename(sample.names1), "-"),
[
, 2)
But this did not keep what came after the second "-". Thank you !
In (1) ^ matches the beginning of string, .* matches everything and ? specifies the shortest match. - matches itself.
In (2) strcapture
extracts everything after the first - producing a data.frame which we then reduce to a vector.
In (3) we show a strsplit
solution.
In (4) we replace the first - with a / and then treating that as a file path extract the base name.
In (5) use regexpr
to find the position of the first - and then use substring
with that position plus 1 to extract the desired portion.
# 1
sub("^.*?-", "", sample.names1)
## [1] "CR1" "AB8" "EGFP1-10"
# 2
strcapture("-(.*)", sample.names1, list(""))[[1]]
## [1] "CR1" "AB8" "EGFP1-10"
# 3
sapply(strsplit(sample.names1, "-"), \(x) paste(tail(x, -1), collapse = "-"))
## [1] "CR1" "AB8" "EGFP1-10"
# 4
basename(sub("-", "/", sample.names1))
## [1] "CR1" "AB8" "EGFP1-10"
# 5
substring(sample.names1, regexpr("-", sample.names1) + 1)
## [1] "CR1" "AB8" "EGFP1-10"
The input as shown in the question:
sample.names1 <- c("S0938-CR1", "S0957-AB8", "S0971-EGFP1-10")