I have a text document i need to use stemming and Lemmatization on. I have already cleaned the data and tokenised it as well as removing stop words
what i need to do is take the list as an input and return a dict and the dict should have the keys 'original stem and lemmma. and the values being the nth word transformed in that way
snowball stemmer is defined as Stemmer()
and WordNetLemmatizer is defined as lemmatizer()
heres the code ive written but it does give our an error
def find_roots(token_list, n):
n = 2
original = tokens
stem = [ele for sub in original for idx, ele in
enumerate(sub.split()) if idx == (n - 1)]
stem = stemmer(stem)
lemma = [ele for sub in original for idx, ele in
enumerate(sub.split()) if idx == (n - 1)]
lemma = lemmatizer()
return
Any help would be appreciated
I really don't understand what you are trying to do in the list comprehensions, so I'll just write how I would do it:
from nltk import WordNetLemmatizer, SnowballStemmer
lemmatizer = WordNetLemmatizer()
stemmer = SnowballStemmer("english")
def find_roots(token_list, n):
token = token_list[n]
stem = stemmer.stem(token)
lemma = lemmatizer.lemmatize(token)
return {"original": token, "stem": stem, "lemma": lemma}
roots_dict = find_roots(["said", "talked", "walked"], n=2)
print(roots_dict)
> {'original': 'walked', 'stem': 'walk', 'lemma': 'walked'}