[SOLVED] Change complicated strings in R with qsub or R-strings

Change complicated strings in R with qsub or R-strings

I have a column of a data frame that has thousands complicate sample names like this

sample- c("16_3_S16_R1_001", "16_3_S16_R2_001", "2_3_S2_R1_001","2_3_S2_R2_001")

I am trying with no success to change the sample names to achieve the following sample names 16.3R1, 16.3R2, 2.3R1,2.3R2

I am thinking of solving the problem with qsub or stringsR. Any suggestion? I have tried qsub but not retrieving the desirable name

Solution

You can use sub to extract the parts :

sample <- c("16_3_S16_R1_001","16_3_S16_R2_001","2_3_S2_R1_001","2_3_S2_R2_001")
sub('(\\d+)_(\\d+)_.*(R\\d+).*', '\\1.\\2\\3', sample)
#[1] "16.3R1" "16.3R2" "2.3R1"  "2.3R2"

\\d+ refers to one or more digits. The values captured between () are called as capture groups. So here we are capturing one or more digits(1), followed by underscore and by another digit (2) and finally "R" with a digit (3). The values which are captured are referred using back reference so \\1 is the first value, \\2 as second value and so on.