pythontwittertweepytwitter-streaming-apimining

How to stream tweets using tweepy from a start date to end date using python?


I am currently in the process of doing some research using sentiment analysis on twitter data regarding a certain topic (isn't necessarily important to this question) using python, of which I am a beginner at. I understand the twitter streaming API limits users to access only to the previous 7 days unless you apply for a full enterprise search which opens up the whole archive. I had recently been given access to the full archive for this research project from twitter but I am unable to specify a start and end date to the tweets I would like to stream into a csv file. This is my code:

import pandas as pd
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener


ckey = 'xxxxxxxxxxxxxxxxxxxxxxx'
csecret = 'xxxxxxxxxxxxxxxxxxxxxxx'
atoken = 'xxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxx'
asecret = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxx'


# =============================================================================
# def sentimentAnalysis(text):
#     output = '0'
#     return output
# =============================================================================

class listener(StreamListener):
    def on_data(self, data):
        tweet = data.split(',"text":"')[1].split('","source')[0]
        saveMe = tweet+'::'+'\n'
        output = open('output.csv','a')
        output.write(saveMe)
        output.close()
        return True

    def on_error(self, status):
        print(status)

auth = OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
twitterStream = Stream(auth, listener())
twitterStream.filter(track=["#weather"], languages = ["en"])

Now this code streams twitter date from the past 7 days perfectly. I tried changing the bottom line to

twitterStream.filter(track=["#weather"], languages = ["en"], since = ["2016-06-01"])

but this returns this error :: filter() got an unexpected keyword argument 'since'.

What would be the correct way to filter by a given date frame?


Solution

  • The tweepy does not provide the "since" argument, as you can check yourself here.

    To achieve the desired output, you will have to use the api.user_timeline, iterating through pages until the desired date is reached, Eg:

    import tweepy
    import datetime
    
    # The consumer keys can be found on your application's Details
    # page located at https://dev.twitter.com/apps (under "OAuth settings")
    consumer_key=""
    consumer_secret=""
    
    # The access tokens can be found on your applications's Details
    # page located at https://dev.twitter.com/apps (located
    # under "Your access token")
    access_token=""
    access_token_secret=""
    
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)
    
    api = tweepy.API(auth)
    page = 1
    stop_loop = False
    while not stop_loop:
        tweets = api.user_timeline(username, page=page)
        if not tweets:
            break
        for tweet in tweets:
            if datetime.date(YEAR, MONTH, DAY) < tweet.created_at:
                stop_loop = True
                break
            # Do the tweet process here
        page+=1
        time.sleep(500)
    

    Note that you will need to update the code to fit your needs, this is just a general solution.