I have date like the following
A <- c("-0.00023--0.00243unitincrease", "-0.00176-0.02176pmol/Lincrease(replication)",
"0.00180-0.01780%varianceunitdecrease")
I want to extract the digits part and the rest part as two columns B and C. after extraction, it should get the following data frame:
# A B C
# -0.00023--0.00243unitincrease -0.00023--0.00243 unitincrease
# -0.00176-0.02176pmol/Lincrease(replication) -0.00176-0.02176 pmol/Lincrease(replication)
# 0.00180-0.01780%varianceunitdecrease 0.00180-0.01780 %varianceunitdecrease
how to get that result in R?
Using strsplit
with positive lookahead/lookbehind. The [a-z%]
denotes the range of letters from a to z as well as the % sign and should be expanded if there are other possibilities.
r1 <- do.call(rbind, strsplit(A, "(?<=\\d)(?=[a-z%])", perl=TRUE))
res1 <- setNames(as.data.frame(cbind(A, r1)), LETTERS[1:3])
res1
# A B C
# 1 -0.00023--0.00243unitincrease -0.00023--0.00243 unitincrease
# 2 -0.00176-0.02176pmol/Lincrease(replication) -0.00176-0.02176 pmol/Lincrease(replication)
# 3 0.00180-0.01780%varianceunitdecrease 0.00180-0.01780 %varianceunitdecrease
You may also want to get the numbers,
res2 <- type.convert(as.data.frame(
do.call(rbind, strsplit(A, "(?<=\\d)-|(?<=\\d)(?=[a-z%])", perl=TRUE))))
res2
# V1 V2 V3
# 1 -0.00023 -0.00243 unitincrease
# 2 -0.00176 0.02176 pmol/Lincrease(replication)
# 3 0.00180 0.01780 %varianceunitdecrease
where:
str(res2)
# 'data.frame': 3 obs. of 3 variables:
# $ V1: num -0.00023 -0.00176 0.0018
# $ V2: num -0.00243 0.02176 0.0178
# $ V3: Factor w/ 3 levels "%varianceunitdecrease",..: 3 2 1