[SOLVED] Downloading tweets with Tweepy

Downloading tweets with Tweepy

I have a script that downloads a number of tweets using Cursor function of Tweepy. The issue is if I specify the number of tweets to be downloaded, Tweepy downloads so many many tweets of which 90 percent are duplicates. Below is my exact code snippet.

qw = ['Pele']
tweet_dataset = pd.DataFrame(columns=['Tweet_id','Author'])

for tweet in tw.Cursor(api.search_tweets,tweet_mode='extended', q=qw).items(5):
        appending_dataframe = pd.DataFrame([[tweet.id,tweet.author.screen_name]],
                                           columns=['Tweet_id','Author'])
        tweet_dataset = tweet_dataset.append(appending_dataframe)
        print(tweet_dataset[['Author','Tweet_id']].head())

From the above script I only want to return 5 tweets, instead it loops, the first time 1 tweet, the second time two tweets ... until it reaches the fifth time and return 5 tweets. Please see below snippet of the results: (https://i.sstatic.net/Dnm7y.png)

I only want say 5 tweets from cursor not 5 groups of tweets as Cursor returns it.

Solution

The head method returns by default the first 5 lines.

Therefore, at every iteration you are printing the first 5 lines. Which returns 1 line in the first iteration, as there is only one line, 2 lines in the second iteration, and so on.

.head(1) would instead return one line at a time.