rselenium-webdriverweb-scrapingrselenium

Scraping text in whitebox


I am trying to collect some Dutch historical election data. Below you see the code I have been using. I still need to figure out how to iterate the process for every 'Gemeente', but my main problem now is that I am not able to scrape the election results, which are contained in a box (i.e., the results object does not capture that information). Do you have any suggestions on how to proceed? Thank you.

rD <- rsDriver(browser="firefox", port=4545L, verbose=F)
remDr <- rD[["client"]]

# Navigate to the URL
url <- 'https://www.verkiezingsuitslagen.nl/verkiezingen/detail/TK19250701/663214'
remDr$navigate(url)

dropdown1 <- remDr$findElement(using = 'id', value = "2")
dropdown1$clickElement()

dropdown2 <- remDr$findElement(using = 'id', value = "3")
dropdown2$clickElement()

option <- remDr$findElement(using = 'xpath', "/html/body/main/div/div/div[2]/div[1]/div/div[2]/div/select/option[2]")
option$clickElement()

results <- remDr$findElement(using = 'class name', value = "whitebox")

Solution

  • All the data is stored in a series of JSON files, it is a matter of searching the network tab of your browser's developer's tools.
    The below code will pull a list of regional codes and retrieve a file of results.

    library(dplyr)
    library(jsonlite)
    
    #the regional codes are stored here;
    codes <- jsonlite::fromJSON("https://www.verkiezingsuitslagen.nl/verkiezingen/StemmingChartUitslagDataJson?stemmingId=10778")
    codes <- codes$UitslagPerRegio
    
    #election results are store in JSON format that this base URL
    baseURL <- "https://www.verkiezingsuitslagen.nl/verkiezingen/detailJson/TK19250701/"
    
    #for the town of Goor
    regionCode <- codes[codes$RegioNaam=="Goor", ]$StemregioId
    
    #results are store in a complex list here:
    jsonlite::fromJSON(paste0(baseURL, regionCode))