I'm trying to only perform Lemmatization on words in a string that have more than 4 letters. The desired output from the following code should be 'us american', but I received an invalid syntax error.
import nltk
from nltk.tokenize import TweetTokenizer
lemmatizer = nltk.stem.WordNetLemmatizer()
w_tokenizer = TweetTokenizer()
wd = w_tokenizer.tokenize(('us americans'))
[lemmatizer.lemmatize(w) for w in wd if len(w)>4 else wd for wd in w]
You could try with this list comprehension:
[lemmatizer.lemmatize(w) if len(w)>4 else w for w in wd]
Then, if you want a single string considering your input sample, you can use the Python join
operation on strings:
' '.join([lemmatizer.lemmatize(w) if len(w)>4 else w for w in wd])