javanlpstanford-nlppart-of-speech

Java Stanford NLP: Part of Speech labels?


The Stanford NLP, demo'd here, gives an output like this:

Colorless/JJ green/JJ ideas/NNS sleep/VBP furiously/RB ./.

What do the Part of Speech tags mean? I am unable to find an official list. Is it Stanford's own system, or are they using universal tags? (What is JJ, for instance?)

Also, when I am iterating through the sentences, looking for nouns, for instance, I end up doing something like checking to see if the tag .contains('N'). This feels pretty weak. Is there a better way to programmatically search for a certain part of speech?


Solution

  • The Penn Treebank Project. Look at the Part-of-speech tagging ps.

    JJ is adjective. NNS is noun, plural. VBP is verb present tense. RB is adverb.

    That's for english. For chinese, it's the Penn Chinese Treebank. And for german it's the NEGRA corpus.

    1. CC Coordinating conjunction
    2. CD Cardinal number
    3. DT Determiner
    4. EX Existential there
    5. FW Foreign word
    6. IN Preposition or subordinating conjunction
    7. JJ Adjective
    8. JJR Adjective, comparative
    9. JJS Adjective, superlative
    10. LS List item marker
    11. MD Modal
    12. NN Noun, singular or mass
    13. NNS Noun, plural
    14. NNP Proper noun, singular
    15. NNPS Proper noun, plural
    16. PDT Predeterminer
    17. POS Possessive ending
    18. PRP Personal pronoun
    19. PRP$ Possessive pronoun
    20. RB Adverb
    21. RBR Adverb, comparative
    22. RBS Adverb, superlative
    23. RP Particle
    24. SYM Symbol
    25. TO to
    26. UH Interjection
    27. VB Verb, base form
    28. VBD Verb, past tense
    29. VBG Verb, gerund or present participle
    30. VBN Verb, past participle
    31. VBP Verb, non­3rd person singular present
    32. VBZ Verb, 3rd person singular present
    33. WDT Wh­determiner
    34. WP Wh­pronoun
    35. WP$ Possessive wh­pronoun
    36. WRB Wh­adverb