I use SOAP to extract data from the BRENDA enzyme. After extracting I get the following flat data type:
ecNumber3.2.1.23#piValue6.9!ecNumber3.2.1.23#piValue7.1!ecNumber4.4.1.14#piValue6
And I want to reshape data to the following type:
ecNumber | piValue |
---|---|
3.2.1.23 | 6.9 |
3.2.1.23 | 7.1 |
4.4.1.14 | 6 |
Can I do that using the awk function? Or a bash command of some kind? Or R?
In base R
, we may use read.dcf
after inserting \n
str2 <- gsub("#", "\n", gsub("!", "\n\n", gsub("([a-z])([0-9])", "\\1: \\2", str1)))
read.dcf(textConnection(str2), all = TRUE)
ecNumber piValue
1 3.2.1.23 6.9
2 3.2.1.23 7.1
3 4.4.1.14 6
str1 <- "ecNumber3.2.1.23#piValue6.9!ecNumber3.2.1.23#piValue7.1!ecNumber4.4.1.14#piValue6"