I'm having trouble with the code below. The function test is used to get data from a website and works pretty well for all values of i from 2 to 33000 (no matter). But when it comes to get all the pages with my loop, I get parsing errors and multiple identical lines in my dataframe.
library(rvest)
library(chromote)
library(jsonlite)
library(dplyr)
test=function(i){
b <- ChromoteSession$new()
p=b$Page$loadEventFired(wait_ = FALSE)
b$Page$navigate(paste("https://www.ecologie.gouv.fr/sru_api/api/towns/",i,sep=""),wait_ = FALSE)
b$wait_for(p)
html <- b$Runtime$evaluate('document.documentElement.outerHTML')
content <- read_html(html$result$value)
data_json=html_text(content)
df=fromJSON(data_json)
return(df)}
ma_liste <- list()
n=100
for (i in 2:n){
tryCatch({
ma_liste <- c(ma_liste, list(test(i)))
})
}
ma_liste
dataframe <- do.call(rbind, ma_liste)
dataframe <- as.data.frame(dataframe)
I tried to ignore the problematic lines with tryCatch but it doesn't fix the issue of multiple lines (and skips a lot of data). Can you help me on this ? Thanks.
The problem persist on my own computer. Since the connection was the problem, I demanded that the loop tries again for every failed iteration with trycatch and it works fine for me. I conclude that my problem is my proxy/firewall or something independent from the code you will all be able to provide me with. Now remains the problem of the speed of execution but that is less of a matter to me.
library(rvest)
library(chromote)
library(jsonlite)
library(dplyr)
library(progress)
test <- function(i) {
b <- ChromoteSession$new()
p <- b$Page$loadEventFired(wait_ = FALSE)
b$Page$navigate(paste("https://www.ecologie.gouv.fr/sru_api/api/towns/", i, sep = ""), wait_ = FALSE)
b$wait_for(p)
html <- b$Runtime$evaluate('document.documentElement.outerHTML')
content <- read_html(html$result$value)
data_json <- html_text(content)
df <- fromJSON(data_json)
b$close()
return(df)
}
start.time <- Sys.time()
ma_liste <- list()
n <- 100
pb <- progress_bar$new(total = n)
for (i in 2:n) {
pb$tick()
retry <- TRUE
while (retry) {
tryCatch({
ma_liste <- c(ma_liste, list(test(i)))
retry <- FALSE # Pas d'erreur, donc pas besoin de réessayer
}, error = function(e) {
message("", i, ": ", conditionMessage(e))
Sys.sleep(0.001) # Attendre un certain temps avant de réessayer
})
}
}
dataframe <- do.call(rbind, ma_liste)
dataframe <- as.data.frame(dataframe)
end.time <- Sys.time()
time.taken <- round(end.time - start.time,2)
time.taken