pythonnltkporter-stemmerstem

python nltk -- stemming list of sentences/phrases


I have bunch of sentences in a list and I wanted to use nltk library to stem it. I am able to stem one sentence at a time, however I am having issues stemming sentences from a list and joining them back together. Is there a step I am missing? Quite new to nltk library. Thanks!

import nltk 
from nltk.stem import PorterStemmer 
ps = PorterStemmer()

# Success: one sentences at a time 
data = 'the gamers playing games'
words = word_tokenize(data)
for w in words:
    print(ps.stem(w))


# Fails: 

data_list = ['the gamers playing games',
            'higher scores',
            'sports']
words = word_tokenize(data_list)
for w in words:
    print(ps.stem(w))

# Error: TypeError: expected string or bytes-like object
# result should be: 
['the gamer play game',
 'higher score',
 'sport']

Solution

  • You're passing a list to word_tokenize which you can't.

    The solution is to wrap your logic in another for-loop,

    data_list = ['the gamers playing games','higher scores','sports']
    for words in data_list:
        words = tokenize.word_tokenize(words)
        for w in words:
            print(ps.stem(w))
    
    >>>>the
    gamer
    play
    game
    higher
    score
    sport