rweb-scrapingrvestwebharvest

How to webscrape share counts in R


I am trying to download the share count from the left SumoMe plugin of this website http://www.r-bloggers.com/erum-2016-first-european-conference-for-the-programming-language-r/

I try to use R code based on rvest package

> library(rvest)
Loading required package: xml2
> url <- 'http://www.r-bloggers.com/erum-2016-first-european-conference-for-the-programming-language-r/'
> read_html(url) %>%
+   html_nodes('.wpusb-counts span')
{xml_nodeset (1)}
[1] <span data-element="total-share"></span>

But have received empty response. The page looks like to start with 0 share-count and then it updates after a few second after you spend time on that website. Can someone could suggest any possible solution to that or advice any package? Is RSelenium a good package for that? I haven't used it before.


Solution

  • It looks like that value is loaded asynchronously by javascript so yes, RSelenium may be your best bet. I ended up using the xpath selector in Firebug to pass that parameter to browser$findElement

    library(RSelenium)
    
    browser <- remoteDriver()
    browser$open()
    browser$navigate('http://www.r-bloggers.com/erum-2016-first-european-conference-for-the-programming-language-r/')
    value <- browser$findElement(using = 'xpath', '/html/body/div[5]/div/div[1]/div/span')
    print(value$getElementText())
    
    [[1]]
    [1] "7"