pythonnlpnltkporter-stemmer

gerund form of a word in python


I'd like to get the gerund form of a string. I have not found a straightforward way to invoke a library to get the gerund.

I applied the rules for words ending in 'ing`, but because I am getting some errors due to exceptions. Then, I am checking against the cmu words to ensure the generated gerund word is correct. The code looks as follows:

import cmudict
import re

ing= 'ing'
vowels = "aeiou"
consonants = "bcdfghjklmnpqrstvwxyz"
words=['lead','take','hit','begin','stop','refer','visit']
cmu_words= cmudict.words()
g_w = []

for word in words:
    if word[-1] == 'e':
        if word[:-1] + ing in cmu_words:
            g_w.append(word[:-1] + ing)             
    elif count_syllables(word) == 1 and word[-2] in vowels and word[-1] in consonants:
        if word.__len__()>2 and word[-3] in vowels:
            if word + ing in cmu_words:
                g_w.append(word + ing)                 
        else:
            if word + word[-1] + ing in cmu_words:
                g_w.append(word + word[-1] + ing)
    elif count_syllables(word)>1 and word[-2] in vowels and word[-1] in consonants:
        if word + word[-1]+ ing in cmu_words:
            g_w.append(word + word[-1]+ ing)            
        else:
            if word + ing in cmu_words:
                g_w.append(word + ing) 
    
print(g_w)

The rules are as follow:

when a verb ends in "e", drop the "e" and add "-ing". For example: "take + ing = taking".
when a one-syllable verb ends in vowel + consonant, double the final consonant and add "-ing". For example: "hit + ing = hitting".
When a verb ends in vowel + consonant with stress on the final syllable, double the consonant and add "-ing". For example: "begin + ing = beginning".
Do not double the consonant of words with more than one syllable if the stress is not on the final

Is there a more efficient way to get the gerunds of a string if exists?

Thanks


Solution

  • Maybe this is what you are looking for. Library called pyinflect

    A python module for word inflections that works as a spaCy extension. To use standalone, import the method getAllInflections and/or getInflection and call them directly. The method getInflection takes a lemma and a Penn Treebank tag and returns a tuple of the specific inflection(s) associated with it.

    There is a variety of tags available for getting inflections including the 'VBG' tag (Verb, Gerund) you are looking for.

    pos_type = 'A'
    * JJ      Adjective
    * JJR     Adjective, comparative
    * JJS     Adjective, superlative
    * RB      Adverb
    * RBR     Adverb, comparative
    * RBS     Adverb, superlative
    
    pos_type = 'N'
    * NN      Noun, singular or mass
    * NNS     Noun, plural
    
    pos_type = 'V'
    * VB      Verb, base form
    * VBD     Verb, past tense
    * VBG     Verb, gerund or present participle
    * VBN     Verb, past participle
    * VBP     Verb, non-3rd person singular present
    * VBZ     Verb, 3rd person singular present
    * MD      Modal
    

    Here is a sample implementation.

    #!pip install pyinflect
    from pyinflect import getInflection
    
    words = ['lead','take','hit','begin','stop','refer','visit']
    [getInflection(i, 'VBG') for i in words]
    
    [('leading',),
     ('taking',),
     ('hitting',),
     ('beginning',),
     ('stopping', 'stoping'),
     ('referring',),
     ('visiting',)]
    

    NOTE: The authors have setup a more sophisticated and benchmarked library which does both lemmatization and inflections called LemmInflect. Do check this out if you want something more reliable than the above library. The syntax is pretty much the same as above.