pythonstemmingporter-stemmer

porter-stemmer: Stemming in python is not working


  train_data = ["Consultant, changing,  Waiting"]

I'm trying to apply the stemmer to the data with the following code, but It keeps the original data:

 stemmer = stem.porter.PorterStemmer()
    
     train_data = train_stemmer
        
    for i in range(len(train_stemmer)):
        train_stemmer[i] = stemmer.stem(train_stemmer[i])

The code runs fine but does not produce my expected result, which is:

["Consult, change, Wait"]

Solution

  • Two things jump out:

    1. train_data in your question is a list containing one string ["Consult, change, Wait"], rather than a list of three strings ["Consult", "change", "Wait"]
    2. Stemming converts to lowercase automatically

    If you intended for the list to contain one string, this should work fine:

    from nltk.stem import porter
    
    stemmer = porter.PorterStemmer()
    
    # List of one string
    string_in_list = ["Consult, change, Wait"]
    for word in string_in_list:
        print(stemmer.stem(word))
    print("----")
    

    If you wanted a list of three strings, then modify to include quotes between commas:

    # List of three strings
    individual_words = ["Consult", "change", "Wait"]
    for word in individual_words:
        print(stemmer.stem(word))
    print("----")
    

    Handling the upper vs. lowercase at the start of the word requires passing a parameter, but can make sense if you're trying to handle proper nouns (e.g. distinguish stemmed change from the name Chang).

    # Stem but do not convert first character to lowercase
    for word in individual_words:
        print(stemmer.stem(word, to_lowercase=False))
    

    Expected output when all three run:

    consult, change, wait
    ----
    consult
    chang
    wait
    ----
    Consult
    chang
    Wait