twittertweepytwitter-streaming-api

Storing streamed tweets in a list for further analysis


I am building a data mining app to collect tweets using the Twitter streaming API (via tweepy) and run a suite of NLP algorithms on it. So far all I have been able to do is get the tweets to be written into an external file. Due to the volume of tweets I am going to collect is a 100 at a time (pretty small) and deployment concerns, I wish to collect these tweets to a dictionary or list for further analysis. However, I have failed in doing this. The code I have so far is given below:

import tweepy

class MyStreamListener(tweepy.StreamListener):
    def __init__(self, api=None):
        super(MyStreamListener, self).__init__()
        self.num_tweets = 0
        self.tweets = []

    def on_status(self, status):
        #print(status.text)
        self.num_tweets += 1
        self.tweets.append(status.text)
        if self.num_tweets > 100:
            return False

def getstreams(keyword):
    CONSUMER_KEY    = ''
    CONSUMER_SECRET = ''
    ACCESS_TOKEN  = ''
    ACCESS_SECRET = ''
    auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
    auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)
    api = tweepy.API(auth, wait_on_rate_limit=True)
    myStreamListener = MyStreamListener()
    myStream = tweepy.Stream(auth = api.auth,listener=myStreamListener)
    tweet_list = myStream.filter(track=[keyword])
    return tweet_list.tweets

getstreams('Starbucks')

However when I run this, all I get is:

AttributeError: 'NoneType' object has no attribute 'tweets'

pointing to the line:

return tweet_list.tweets

I'd be grateful if anyone could answer how to overcome this issue and shed insight on how to collect n number of tweets into a list.


Solution

  • You can use the on_data function in your class.

    def on_data(self, data):
        # Converting data , which is an object, into JSON
        tweet = json.loads(data)
    
        # my_tweet is our list declared globally
        my_tweet.append(tweet)