pythonporter-stemmer

After stemming dataset some of words are showing incorrect way


tokenize_texts = [ ['mentioned', 'reviewers', **'episode', 'exactly'**] ]

porter_stemmed_texts = []
for i in range(0, len(tokenize_texts )):
    porter_stemmed_text = [nltk.stem.PorterStemmer().stem(word) for word in tokenize_texts[i]]
    porter_stemmed_texts.append(porter_stemmed_text)

porter_stemmed_texts

output :

[ ['mention', 'review', **'episod', 'exactli'**] ]

expect output :-

[ ['mention', 'review', **'episode', 'exactly'**] ]

Are these errors normal. Can't we get 100% accurate words.


Solution

  • The stemmer is working as intended.

    "Episode" should stem to "episod" so that it stems the same way as "episodic".

    "Exactly" -> "Exactli" is an a quirk in the algorithm, but it doesn't make a difference in the end because you should also be stemming the text you're comparing against, so it will also contain 'exactli' once stemmed.