I am trying to find what words appear the most often. But each time I run FreqDist it does not return the most common words but letters.
FreqDist({' ': 496, 'e': 306, 't': 205, 'a': 182, 's': 181, 'n': 160, 'o': 146, 'r': 142, 'i': 118, 'l': 110, ...})
Here is my code:
newdf['tokens1'] = newdf['review'].apply(word_tokenize) newdf['tokens1'] = newdf['tokens1'].apply(str)
for i in range(newdf.shape[1]):
# Add each comment.
review_comments = review_comments + newdf['tokens1'][i]
from nltk.probability import FreqDist
fdist = FreqDist(review_comments)
fdist
returns
FreqDist({' ': 496, 'e': 306, 't': 205, 'a': 182, 's': 181, 'n': 160, 'o': 146, 'r': 142, 'i': 118, 'l': 110, ...})
You need first yo use nltk.word_tokenize:
from nltk.tokenize import word_tokenize
tokens = nltk.word_tokenize(review_comments)
fdist = FreqDist(tokens)
fdist