I'm trying to use a library called snowballstemmer in Python, but it seems that it's not working as expected. What could the reason be? Please see my code below.
My data set:
df=[['musteri', 'hizmetlerine', 'cabuk', 'baglaniyorum'],['konuda', 'yardımcı', 'oluyorlar',
'islemlerimde']]
I have applied snowballstemmer package and import TurkishStemmer
from snowballstemmer import TurkishStemmer
turkStem=TurkishStemmer()
data_words_nostops=[turkStem.stemWord(word) for word in df]
data_words_nostops
[['musteri', 'hizmetlerine', 'cabuk', 'baglaniyorum'],
['konuda', 'yardımcı', 'oluyorlar', 'islemlerimde']]
Unfortunately it didn't work. But when I applied it to single words, it works as expected:
turkStem.stemWord("islemlerimde")
'islem'
What could be the problem? Any help will be appreciated.
Thank you.
Did you mean to have a list of strings instead of a list of lists containing strings?
I was able to get the stems for each word when I reformatted your code this way:
from snowballstemmer import TurkishStemmer
df = [
'musteri',
'hizmetlerine',
'cabuk',
'baglaniyorum',
'konuda',
'yardımcı',
'oluyorlar',
'islemlerimde'
]
turkStem = TurkishStemmer()
data_words_nostops = [turkStem.stemWord(word) for word in df]
print(data_words_nostops)
If you have a list of lists of strings (lets say its what you've defined as df
) and you want to flatten it down to a single list of words, you can do something like this:
df = [
['musteri', 'hizmetlerine', 'cabuk', 'baglaniyorum'],
['konuda', 'yardımcı', 'oluyorlar', 'islemlerimde']
]
flattened_df = [item for sublist in df for item in sublist]
# Output:
# ['musteri', 'hizmetlerine', 'cabuk', 'baglaniyorum', 'konuda', 'yardımcı', 'oluyorlar', 'islemlerimde']
Credit for the above goes to this StackOverflow post.
Alternatively, you could just correct the looping to address the problem with your original layout:
df = [
['musteri', 'hizmetlerine', 'cabuk', 'baglaniyorum'],
['konuda', 'yardımcı', 'oluyorlar', 'islemlerimde']
]
turkStem = TurkishStemmer()
all_stem_lists = []
for word_group in df:
output_stems = []
for word in word_group:
stem = turkStem.stemWord(word)
output_stems.append(stem)
all_stem_lists.append(output_stems)
print(all_stem_lists)