nlpstanford-nlp

Extracting clause from a Penn Treebank-formatted text


Say I have a sentence:

After he had eaten the cheese, Bill went to the grocery.

In my program, I get the following output:

---PARSE TREE---
(ROOT
  (S
    (SBAR (IN After)
      (S
        (NP (PRP he))
        (VP (VBD had)
          (VP (VBN eaten)
            (NP (DT the) (NN cheese))))))
    (, ,)
    (NP (NNP Bill))
    (VP (VBD went)
      (PP (TO to)
        (NP (DT the) (NN grocery))))
    (. .)))

How would I merge the stuff not within a clause to become an independent clause? Like this:

S Clause {
    SBAR Clause {
         After he had eaten the cheese,
    }

    S Clause {
        Bill went to the grocery.
    }
}

I'm pretty sure that I'm not clear, but basically I want to extract the independent and dependent clauses of the sentence, and the subclauses of those clauses.


Solution

  • Here is a demonstration code from the NLTK guide (It doesn't explicitly show how to extract a clause): http://nltk.googlecode.com/svn/trunk/doc/howto/tree.html