python-3.xnlptreetagger

The lemma of a noun in French


When I run the following code, I get the lemma of the noun "Suppression" being the same word "Suppression".

import treetaggerwrapper as tt
tt_fr = tt.TreeTagger(TAGLANG='fr')
tag = tt_fr.TagText(u'Suppression')

The result I was waiting for is to get the actual verb which is "Supprimer". Is it because of the language (French)? Or is it Treetagger that doesn't do the work? Or is it me who don't understand the meaning of the lemma?


Solution

  • The lemma of the noun "suppression" is..."suppression". What you need is a lexical resource that tells you the verb from which the noun was derived. Have a look at VerbAction, which lists verbs and their associated deverbal nouns. Just parse the XML into a Python dictionary and look up the corresponding verb for each noun you encounter.