rbashflat-file

How to reshape flat data from SOAM result by R or Bash?


I use SOAP to extract data from the BRENDA enzyme. After extracting I get the following flat data type:

ecNumber3.2.1.23#piValue6.9!ecNumber3.2.1.23#piValue7.1!ecNumber4.4.1.14#piValue6

And I want to reshape data to the following type:

ecNumber piValue
3.2.1.23 6.9
3.2.1.23 7.1
4.4.1.14 6

Can I do that using the awk function? Or a bash command of some kind? Or R?


Solution

  • In base R, we may use read.dcf after inserting \n

    str2 <- gsub("#", "\n", gsub("!", "\n\n", gsub("([a-z])([0-9])", "\\1: \\2", str1))) 
    read.dcf(textConnection(str2), all = TRUE)
      ecNumber piValue
    1 3.2.1.23     6.9
    2 3.2.1.23     7.1
    3 4.4.1.14       6
    

    data

    str1 <- "ecNumber3.2.1.23#piValue6.9!ecNumber3.2.1.23#piValue7.1!ecNumber4.4.1.14#piValue6"