I have a vector of strings string which look like this
ABC_EFG_HIG_ADF_AKF_MNB
Now from each of this element I want to extract the 3rd set of strings(from left) i.e in this case HIG. How can I achieve this in R
We can use sub
. We match one or more characters that are not _
([^_]+
) followed by a _
. Keep it in a capture group. As we wants to extract the third set of non _
characters, we repeat the previously enclosed group 2 times ({2}
) followed by another capture group of one or more non _
characters, and the rest of the characters indicated by .*
. In the replacement, we use the backreference for the second capture group (\\2
).
sub("^([^_]+_){2}([^_]+).*", "\\2", str1)
#[1] "HIG"
Or another option is with scan
scan(text=str1, sep="_", what="", quiet=TRUE)[3]
#[1] "HIG"
A similar option as mentioned by @RHertel would be to use read.table/read.csv
on the string
read.table(text=str1,sep = "_", stringsAsFactors=FALSE)[,3]
str1 <- "ABC_EFG_HIG_ADF_AKF_MNB"