pythonweb-scrapingtweets

Scraping one tweet per user using snscrape


I'm using snscrape.modules.twitter.TwitterSearchScraper() function to scrape tweets for a specific location and time interval. The code is the following one:

loc ='40.4165, -3.70256, 10km'
query = 'geocode:"{}" since:2020-03-15 until:2020-05-01'.format(loc)
tweets_list = []

for tweet in sntwitter.TwitterSearchScraper(query).get_items():
      if i==100:
        break
      tweets_list.append([tweet.date, tweet.user.username, tweet.user.id, tweet.coordinates, tweet.rawContent])

My question is if there is a way to get only one tweet per user, because by running the above code some users are repeated.


Solution

  • You could check if the tweet.user.id exists before adding it to your list.

    Here, I added a new list (called tweets_user_ids) for store the values from tweet.user.id and add the tweet in the tweets_list list variable if the tweet.user.id does not exists on the new list.

    Code:

    import snscrape
    import snscrape.modules.twitter as sntwitter
    
    loc ='40.4165, -3.70256, 10km'
    query = 'geocode:"{}" since:2020-03-15 until:2020-05-01'.format(loc)
    tweets_list = []
    max_amount_of_tweets = 100
    tweets_user_ids = [] # Lists of tweets user ids - this is for check and avoid duplicates.
    i = 0 # I suppose this is an incremental value.
    
    for tweet in sntwitter.TwitterSearchScraper(query).get_items():
      # Add the ids to a separate list: 
      if (len(tweets_user_ids) == 0):
        tweets_user_ids.append(tweet.user.id)
      
      # Check if the id is not already added, then, add the data: 
      if (tweet.user.id not in tweets_user_ids):
        tweets_user_ids.append(tweet.user.id)
        tweets_list.append([tweet.date, tweet.user.username, tweet.user.id, tweet.coordinates, tweet.rawContent])
        i+=1 # Increment.
        
      # Break the loop when the max amount of tweets is reached.
      if (i == max_amount_of_tweets):
        break
    print(tweets_list)