arraysregexrubynlpfreeling

How to convert command line output from Freeling to consumable array


I am using Ruby for this. Freeling (a NLP tool) has a shallow parser which returns a string like this for the text "I just read the book, the grasshopper lies heavy" when I run a shallow parsing command.

a = <<EOT
S_[
  sn-chunk_[
    +(I i PRP -)
  ]
  adv_[
    +(just just RB -)
  ]
  vb-chunk_[
    +(read read VB -)
  ]
  sn-chunk_[
    (the the DT -)
    +n-chunk_[
      (book book NN -)
      +n-chunk_[
        +(The_Grasshopper_Lies_Heavy the_grasshopper_lies_heavy NP -)
      ]
    ]
  ]
  st-brk_[
    +(. . Fp -)
  ]
]

EOT

I want to get the following array from this:

["I", "just", "read", "the book The Grasshopper Lies Heavy","."]

(I want to merge the words that are under a tree and have it as a single array element.)

So far, I have written this much:

b = a.gsub(/.*\[/,'[').gsub(/.*\+?\((\w+|.) .*/,'\1').gsub(/\n| /,"").gsub("_","")

which returns

[[I][just][read][the[book[The Grasshopper Lies Heavy]]][.]]

So, how can i get the desired array?


Solution

  • From your solution so far:

    result = a.gsub(/.*\[/,'[').gsub(/.*\+?\((\w+|.) .*/,'\1').gsub(/\n| /,"").gsub("_"," ")
    result.split('][').map { |s| s.gsub(/\[|\]/, ' ').strip }     # ["I", "just", "read", "the book The Grasshopper Lies Heavy", "."]