train_data = ["Consultant, changing, Waiting"]
I'm trying to apply the stemmer to the data with the following code, but It keeps the original data:
stemmer = stem.porter.PorterStemmer()
train_data = train_stemmer
for i in range(len(train_stemmer)):
train_stemmer[i] = stemmer.stem(train_stemmer[i])
The code runs fine but does not produce my expected result, which is:
["Consult, change, Wait"]
Two things jump out:
train_data
in your question is a list containing one string ["Consult, change, Wait"]
, rather than a list of three strings ["Consult", "change", "Wait"]
If you intended for the list to contain one string, this should work fine:
from nltk.stem import porter
stemmer = porter.PorterStemmer()
# List of one string
string_in_list = ["Consult, change, Wait"]
for word in string_in_list:
print(stemmer.stem(word))
print("----")
If you wanted a list of three strings, then modify to include quotes between commas:
# List of three strings
individual_words = ["Consult", "change", "Wait"]
for word in individual_words:
print(stemmer.stem(word))
print("----")
Handling the upper vs. lowercase at the start of the word requires passing a parameter, but can make sense if you're trying to handle proper nouns (e.g. distinguish stemmed change
from the name Chang
).
# Stem but do not convert first character to lowercase
for word in individual_words:
print(stemmer.stem(word, to_lowercase=False))
Expected output when all three run:
consult, change, wait
----
consult
chang
wait
----
Consult
chang
Wait