pythonnltktext-chunking

Parse NLTK tree output in a list of noun phrase


I have a sentence

text  = '''If you're in construction or need to pass fire inspection, or just want fire resistant materials for peace of mind, this is the one to use. Check out 3rd party sellers as well Skylite'''

I applied NLTK chunking on it and getting a tree as output.

sentences = nltk.sent_tokenize(d)
sentences = [nltk.word_tokenize(sent) for sent in sentences]
sentences = [nltk.pos_tag(sent) for sent in sentences]

grammar = """NP: {<DT>?<JJ>*<NN.*>+}
       RELATION: {<V.*>}
                 {<DT>?<JJ>*<NN.*>+}
       ENTITY: {<NN.*>}"""

cp = nltk.RegexpParser(grammar)
for i in sentences:
    result = cp.parse(i)
    print(result)
    print(type(result))
    result.draw() 

Output is as follows:

(S If/IN you/PRP (RELATION 're/VBP) in/IN (NP construction/NN) or/CC (NP need/NN) to/TO (RELATION pass/VB) (NP fire/NN inspection/NN) ,/, or/CC just/RB (RELATION want/VB) (NP fire/NN) (NP resistant/JJ materials/NNS) for/IN (NP peace/NN) of/IN (NP mind/NN) ,/, this/DT (RELATION is/VBZ) (NP the/DT one/NN) to/TO (RELATION use/VB) ./.)

HOw can I get noun phrase in format of list of a strings:

[construction, need, fire inspection, fire, resistant materials, peace, mind, the one]

Some suggestions please......?


Solution

  • Something like this:

    noun_phrases_list = [[' '.join(leaf[0] for leaf in tree.leaves()) 
                          for tree in cp.parse(sent).subtrees() 
                          if tree.label()=='NP'] 
                          for sent in sentences]
    #[['construction', 'need', 'fire inspection', 'fire', 'resistant materials', 
    #  'peace', 'mind', 'the one'], 
    # ['party sellers', 'Skylite']]