nlpnltkwordnet

WordNet not returning pertainym for "South Korean" even though pertainym exists - Python


I'm trying to do a pertainym search for "South Korean":

input = "South Korean.a.01.South Korean"
lemma = wn.lemma(input)

According to the Princeton WordNet page, this should return "South Korea"... yet in my code I'm getting the error message that there is no no lemma for "south korean" with part of speech 'a'.

nltk.corpus.reader.wordnet.WordNetError: no lemma 'south korean' with part of speech 'a'

The code works with other words like Chinese, Russian, and others using the exact same setup, with the online Princeton search showing the same part of speech (adjective). Any idea why? Maybe there is a special way to input words with spaces in them?

I originally thought maybe there was a discrepancy between WordNet 3.0 and 3.1 and so upgraded to 3.1, but no luck.


Solution

  • Maybe there is a special way to input words with spaces in them?

    Yes, they need to be replaced with underline.

    So I think your example becomes:

    input = "South_Korean.a.01.South_Korean"
    lemma = wn.lemma(input)
    

    The documentation never explicitly mentions it, as far as I can see, but does show it in a few places.

    (I'm not set up to test it at the moment, but if that still doesn't work, try lowercasing the first part, i.e. "south_korean.a.01.South_Korean" But according to e.g. https://github.com/nltk/nltk/issues/1641 it should match case-insensitively.)