Aussume we have a string:
test <- 'chr1:949920-950500_ENSG_00000187583'
Expected output:
'chr1:949920-950500' 'ENSG_00000187583'
We tried:
strsplit(test, '_(?<=ENS)', perl = TRUE)
[[1]]
[1] 'chr1:949920-950500_ENSG_00000187583'
We also want to split by follow pattern:
"chr1:949920-950500_P16"
# split to
"chr1:949920-950500" "P16"
(?<=ENS)
means "preceded by ENS
". The position after the _
can never preceded by ENS
, so _(?<=ENS)
can't ever match.
Are you trying to split on all the _
that are followed by ENSG
?
_(?=ENSG)
Read this as _
followed by ENSG
.
Are you trying to split on all the _
that aren't preceded by ENSG
?
You can use either of these:
(?<!ENSG)_
_(?<!ENSG_)
(The second might be a tad bit more efficient. But I don't it's worth it for the the extra complexity.)