rrvestdata-harvest

Harvesting data with rvest retrieves no value from data-widget


I'm trying to harvest data using rvest (also tried using XML and selectr) but I am having difficulties with the following problem:

In my browser's web inspector the html looks like

<span data-widget="turboBinary_tradologic1_rate" class="widgetPlaceholder widgetRate rate-down">1226.45</span>

(Note: rate-downand 1226.45 are updated periodically.) I want to harvest the 1226.45 but when I run my code (below) it says there is no information stored there. Does this have something to do with the fact that its a widget? Any suggestions on how to proceed would be appreciated.

library(rvest);library(selectr);library(XML)
zoom.turbo.url <- "https://www.zoomtrader.com/trade-now?game=turbo"
zoom.turbo <- read_html(zoom.turbo.url)
# Navigate to node
zoom.turbo <- zoom.turbo %>% html_nodes("span") %>% `[[`(90)

# No value
as.character(zoom.turbo)
html_text(zoom.turbo)  

# Using XML and Selectr
doc <- htmlParse(zoom.turbo, asText = TRUE)
xmlValue(querySelector(doc, 'span'))

Solution

  • For websites that are difficult to scrape, for example where the content is dynamic, you can use RSelenium. With this package and a browser docker, you are able to navigate websites with R commands.

    I have used this method to scrape a website that had a dynamic login script, that I could not get to work with other methods.