I have a script that downloads a number of tweets using Cursor function of Tweepy. The issue is if I specify the number of tweets to be downloaded, Tweepy downloads so many many tweets of which 90 percent are duplicates. Below is my exact code snippet.
qw = ['Pele']
tweet_dataset = pd.DataFrame(columns=['Tweet_id','Author'])
for tweet in tw.Cursor(api.search_tweets,tweet_mode='extended', q=qw).items(5):
appending_dataframe = pd.DataFrame([[tweet.id,tweet.author.screen_name]],
columns=['Tweet_id','Author'])
tweet_dataset = tweet_dataset.append(appending_dataframe)
print(tweet_dataset[['Author','Tweet_id']].head())
From the above script I only want to return 5 tweets, instead it loops, the first time 1 tweet, the second time two tweets ... until it reaches the fifth time and return 5 tweets. Please see below snippet of the results: (https://i.sstatic.net/Dnm7y.png)
I only want say 5 tweets from cursor not 5 groups of tweets as Cursor returns it.
The head method returns by default the first 5 lines.
Therefore, at every iteration you are printing the first 5 lines. Which returns 1 line in the first iteration, as there is only one line, 2 lines in the second iteration, and so on.
.head(1) would instead return one line at a time.