I'm using snscrape.modules.twitter.TwitterSearchScraper()
function to scrape tweets for a specific location and time interval. The code is the following one:
loc ='40.4165, -3.70256, 10km'
query = 'geocode:"{}" since:2020-03-15 until:2020-05-01'.format(loc)
tweets_list = []
for tweet in sntwitter.TwitterSearchScraper(query).get_items():
if i==100:
break
tweets_list.append([tweet.date, tweet.user.username, tweet.user.id, tweet.coordinates, tweet.rawContent])
My question is if there is a way to get only one tweet per user, because by running the above code some users are repeated.
You could check if the tweet.user.id
exists before adding it to your list.
Here, I added a new list (called tweets_user_ids
) for store the values from tweet.user.id
and add the tweet in the tweets_list
list variable if the tweet.user.id
does not exists on the new list.
Code:
import snscrape
import snscrape.modules.twitter as sntwitter
loc ='40.4165, -3.70256, 10km'
query = 'geocode:"{}" since:2020-03-15 until:2020-05-01'.format(loc)
tweets_list = []
max_amount_of_tweets = 100
tweets_user_ids = [] # Lists of tweets user ids - this is for check and avoid duplicates.
i = 0 # I suppose this is an incremental value.
for tweet in sntwitter.TwitterSearchScraper(query).get_items():
# Add the ids to a separate list:
if (len(tweets_user_ids) == 0):
tweets_user_ids.append(tweet.user.id)
# Check if the id is not already added, then, add the data:
if (tweet.user.id not in tweets_user_ids):
tweets_user_ids.append(tweet.user.id)
tweets_list.append([tweet.date, tweet.user.username, tweet.user.id, tweet.coordinates, tweet.rawContent])
i+=1 # Increment.
# Break the loop when the max amount of tweets is reached.
if (i == max_amount_of_tweets):
break
print(tweets_list)