pythonnltkpart-of-speech

how to get tagset from nltk pos_tag?


I'm trying to get the full tag from nltk pos_tag, but I can't find a simple way to do it using nltk. For example, using tagsets='universal'.

from nltk.tokenize import word_tokenize

def nltk_pos(text):
    token = word_tokenize(text)
    return (nltk.pos_tag(token)[0])[1]

nltk_pos('home')
output: 'NN'
expected output: 'NOUN'

Solution

  • I had the same problem when doing NLP analysis for a paper I wrote. I had to use a mapping function like this:

    import nltk
    from nltk.tokenize import word_tokenize
    
    def get_full_tag_pos(pos_tag):
        tag_dict = {"J": "ADJ",
                    "N": "NOUN",
                    "V": "VERB",
                    "R": "ADV"}
        # assuming pos_tag comes in as capital letters i.e. 'JJR' or 'NN'
        return tag_dict.get(pos_tag[0], 'NOUN')
    
    # example
    words = word_tokenize(text)
    words_pos = nltk.pos_tag(words)
    full_tag_words_pos = [word_pos[0] + "/" + get_full_tag_pos(word_pos[1]) for word_pos in words_pos]