pythonstanford-nlppart-of-speech

Extract POS tag for a word coming before a given word


I am new in python and I am trying to extract Part of speech (Stanford CoreNLP) for a word coming before a given word. for the text = "انسان يحضر طعامه باستخدام الخبز الابيض وبجانبه قطة سوداء؟"

here is my code

for i in nouns:             
    pattren ="\w+(?=\s*"+i+"[^/])"
    re1 = re.search(pattren , text)
    if(re1):
        for tag in tagger.tag(text.split()):       #POS tag extractor
            if re1[0] in tag[1]:
                for specific in tag[1].split():
                    if re1[0] in specific:
                        print("The Noun " + i + ":-")
                        print(specific)

where nouns is an array contains all the NN in the text ['انسان', 'طعام', 'استخدام', 'جانب', 'قطة'] I tried to use regular expression to extract word before .

the output was:

The Noun طعام:-
يحضر/VBP
The Noun استخدام:-
ب/IN
The Noun استخدام:-
الخبز/DTNN
The Noun استخدام:-
الابيض/DTJJ
The Noun استخدام:-
ب/IN
The Noun استخدام:-
جانب/NN
The Noun جانب:-
ب/IN
The Noun جانب:-
الخبز/DTNN
The Noun جانب:-
الابيض/DTJJ
The Noun جانب:-
ب/IN
The Noun جانب:-
جانب/NN
The Noun قطة:-
ه/PRP$
The Noun قطة:-
ه/PRP$

there are repeated words ,and I really could not conduct the issue.


Solution

  • The issue was in the line

    if re1[0] in tag[1]:

    this gets all the words within tag[1] string matches with re1[0] whether it is a word or a char.

    solution, I tried using regular expression to get the exact words in tag[1].

    if re.match(r'\b'+ re1[0]+'(?!\.?\d)', tag[1]):