pythonarraysloopsgrammar

Python text tokenize code to output results from horizontal to vertical with grammar recognition


Below code tokenises the text and identifies the grammar of each tokenised word.

import nltk
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import wordnet as wn

#nltk.download()

text = "Natural language processing is fascinating"

# tokenise the sentence
words = word_tokenize(text)
print(words)


# identify noun, verb, etc grammatically in the sentence
for w in words:
    tmp = wn.synsets(w)[0].pos()
    print (w, ":", tmp)

The output is;

['Natural', 'language', 'processing', 'is', 'fascinating']
Natural : n
language : n
processing : n
is : v
fascinating : v

Where n is noun and v is verb

Can some Python code expert please advises me how to format the output so it will look like below;

nouns = ["natural", "language", "processing"]
verbs = ["is", "fascinating"] 

I need assistance to change the result output format. I think it needs some relevant python code to perform this requirement.


Solution

  • You can achieve it this way :

    # Lists to store parts of speech
    nouns = []
    verbs = []
    
    for w in words:
        synsets = wn.synsets(w)
        if synsets:
            pos = synsets[0].pos()
            if pos == 'n':
                nouns.append(w.lower())
            elif pos == 'v':
                verbs.append(w.lower())
    

    full solution:

    import nltk
    from nltk.tokenize import word_tokenize
    from nltk.corpus import wordnet as wn
    
    # Make sure the necessary NLTK data is available
    nltk.download('punkt')
    nltk.download('wordnet')
    nltk.download('punkt_tab')
    
    text = "Natural language processing is fascinating"
    
    # Tokenize the text
    words = word_tokenize(text)
    
    # Lists to store parts of speech
    nouns = []
    verbs = []
    
    for w in words:
        synsets = wn.synsets(w)
        if synsets:
            pos = synsets[0].pos()
            if pos == 'n':
                nouns.append(w.lower())
            elif pos == 'v':
                verbs.append(w.lower())
    
    print(f"nouns = {nouns}")
    print(f"verbs = {verbs}")
    

    output:

    nouns = ['natural', 'language', 'processing']
    verbs = ['is', 'fascinating']
    

    enter image description here