pythonloopsiterationsentiment-analysistweets

How to print words that are not in the list


I have 2 files, the first one is a list of tweets. And the second one is a list of standard words which looks like this:

acoustics
acquaint
acquaintable
tbc....

I want to iterate through the list of tweets and print the words that are not found in the standard words list.

This is what I tried:

dk = open('wordslist.txt','r')
dlist = []
for x in dk.readlines():
    dlist.append(x.replace('\n',''))

dlist
length = len(tokenized_tweets)
for i in range(length):
    print(tokenized_tweets[i])
for x in range(len(tokenized_tweets)):
    if x[0] not in dlist:
        print(tokenized_tweets[x])

and I got this error : 'int' object is not subscriptable


Solution

  • Read and follow the error message then you'll figure out what the problem is.

    In traceback you would see an arrow pointing to line for x in (len(tokenized_tweets)):. The error message says: 'int' object is not iterable. What is your iterable in that for loop? (len(tokenized_tweets)) Is this really an iterable? No it's an int. The output of len() is always an int(unless you overwrite it).

    You supposed to pass the length of the tokenized_tweetes to the range() object. It is an iterable.

    extra tip:

    Since you're finding the words for every tweet, make a set out of your words. Set's membership testing is much more faster than list. (O(1) > O(n))

    It also removes duplicates if there are any.

    Solution:

    with open("wordslist.txt") as f:
        words_list = {word.removesuffix("\n") for word in f}
    
    with open("tweets.txt") as g:
        for tweete in g:
            for word in tweete.split():
                if word not in words_list:
                    print(word)