rcurlhttrrcurltcltk

Error in curl::curl_fetch_memory(url, handle = handle): Could not resolve host:


My R code (see below) generates these errors in some cases:

[1] "2023-08-12 16:47:37.463"
Error in curl::curl_fetch_memory(url, handle = handle): Could not resolve host: api.abc.com
Request failed [ERROR]. Retrying in 1.3 seconds...
Error in curl::curl_fetch_memory(url, handle = handle): Could not resolve host: api.abc.com
Request failed [ERROR]. Retrying in 1 seconds...
Error in curl::curl_fetch_memory(url, handle = handle):
Could not resolve host: api.abc.com

api.abc.com is not the original API I use. I use a commercial API which noticed me their server was not down at the particular moment above. In some cases when the server was down it returned http-code 503.

I have two questions:

  1. what can be the cause of these errors?
  2. how can I keep my script below continue running in cases with these errors? Currently it breaks after these error messages. I was not expecting this since I use RETRY in my code with GET.

My code below is called every 10 seconds with the scheduler tclTaskSchedule (see end of code). In this examplecode I have used a free API (universities.hipolabs.com) as example.

library(httr) # accessing API's'
library(jsonlite) # JSON parsing
library(dplyr)
library(readr)
library(purrr)
library(tidyr)
library(stringr)
library(tibble)
library(tcltk2)
library(lubridate)

run_api_once <- function() {
  
  mydatalist <- list() #create an empty list
  
  my_next_page_with_number <- "http://universities.hipolabs.com/search?country=United+States"

    mydata1 <- RETRY("GET", my_next_page_with_number)
    
    if(mydata1$status_code != 200){
      print(mydata1$status_code)
      http_responses <<- append(http_responses, paste(mydata1$status_code, Sys.time()))
      has_more_pages <- FALSE
      
    } else {
      
      rawdata <- rawToChar(mydata1$content)
      mydata2 <- fromJSON(rawdata, flatten = FALSE, simplifyVector = FALSE)
      
      mydata <- mydata2
      
      mydatalist <- c(mydatalist, mydata)
    }
  
  
    y <- Sys.time()
    y <- format(y, "%Y-%m-%d %H:%M")
    print(y)
    
  users <- tibble(user = mydatalist)
  myvar <<- users %>% unnest_wider(user) 

return(myvar)
  
}


# call function every 10 seconds:
tclTaskSchedule(10000, run_api_once(), id = "run_api_once", redo = TRUE)

# end session:
tclTaskDelete(NULL)

I suppose it is irrelevant, although for completeness: I stream the content of myvar to a local server on my pc with Plumber. See code below:

# stream df myvar to local api at port 8405:
library(plumber)
pr("D:/plumber_universities2test.R") %>%
# pr("C:/plumber_universities2test.R") %>%
  pr_run(port=8405)

Which calls this script:

library(plumber)
library(dplyr)

#* @param symbol Ticker symbol (just to input something in the function)
#* @get /return
#* @serializer json list(na="string")

universities_data <- function(symbol) {
  data <- myvar
  data 
}

Thanks a lot!


Solution

  • To answer your questions:

    1. There are a couple of possible reasons for that: you are not connected to the internet; your firewall is getting in the way and blocking httr; or you are making a request to an invalid URL. I can't be sure without seeing the actual URL you are making the request to, but I would guess the third option is the most likely. You should check if you are making a mistake while pasting together a particular URL. For example "google.comsearch" instead of "google.com/search"
    2. The reason why RETRY is not acting in the way you expect is because this is not an HTTP error status returned by the server, but your request simply can't be executed. To demonstrate the difference, let's have a look at the behaviour of a simple function that makes a request to a URL that automatically returns an HTTP error and one that does not exist at all:
    library(httr)
    
    test_fun <- function(u) {
      RETRY("GET", u, times = 2)
      print("still running")
    }
    
    # response contains error
    test_fun("https://httpbin.org/status/429")
    #> Request failed [429]. Retrying in 1 seconds...
    #> [1] "still running"
    
    # no repsonse since there is no server at `test.coms`
    test_fun("test.coms")
    #> Error in curl::curl_fetch_memory(url, handle = handle): Could not resolve host: test.coms
    #> Request failed [ERROR]. Retrying in 1 seconds...
    #> Error in curl::curl_fetch_memory(url, handle = handle): Could not resolve host: test.coms
    

    Created on 2023-08-13 with reprex v2.0.2

    As you can see, the first example still executes the remaining code of the function while the second one stops with an error. I would suggest to carefully check why the requests are not getting to the server and if you are certain that there is no better way, you can wrap try around RETRY:

    mydata1 <- try(RETRY("GET", my_next_page_with_number))
    if (is(mydata1, "try-error")) mydata1 <- list(status_code = 404)
    if(mydata1$status_code != 200){
      # your code ...
    }
    

    But the behaviour of RETRY is correct in my opinion as it is not simply ignoring what is probably a mistake in the code or your internet configuration (not a server side issue).