rblogdown

Scrape images from tweets using R


I would love to make a twitter-blogdown blog of images that some one posts, but I'm not sure it is even possible. I used 'twitteR' to scrape all the posts from one person, but it looks like I would have to do something completely different to get images instead of text.

Any advice on what direction to take would be appreciated.


Solution

  • Quite a broad question, but here are some ideas.

    First: I recommend using the rtweet package. In my experience it makes authentication much easier and returns data in easy-to-use structures.

    As an example, here's how I'd fetch my own last 100 tweets after setting up authentication as described in the package documentation:

    library(rtweet)
    library(dplyr)
    
    neilfws <- get_timeline("neilfws", n = 100)
    neilfws %>%
      glimpse()
    

    The column media_id indicates whether a tweet has attached media, value = NA if not. So a quick count of how many rows have media:

    neilfws %>%
      filter(!is.na(media_id) %>%
      nrow()
    

    The link to the media is in the column media_url. So here are the first 6 images associated with my tweets:

    neilfws %>% 
      filter(!is.na(media_id)) %>% 
      select(media_url) %>% 
      head()
    
    1 http://pbs.twimg.com/media/DHzGbvyVoAAm8in.jpg
    2 http://pbs.twimg.com/media/DHfc4idV0AA6qyc.jpg
    3 http://pbs.twimg.com/media/DHfNamEVYAA5H_U.jpg
    4 http://pbs.twimg.com/media/DHYuG1oUwAADV-z.jpg
    5 http://pbs.twimg.com/media/DHQlEQqUAAAHoCK.jpg
    6 http://pbs.twimg.com/media/DHLG_ESUMAAMURj.jpg
    

    Now you have the media URLs, you can work on the code to retrieve or display them.