rasynchronousplumberr-future

Future callr delay with libraries


I'm using future promises in interaction with APIs from the plumber package. To avoid RAM accumulation I use callr plan (check this post for more detail) but it creates a delay in the computing of each API (bringing some API from 0.1s to 3s, which heavily affects performance) and I'm looking for a way to reduce this delay.

When I run API1 from the reprex below, it takes 0.2s with a multisession plan and 3.1s with callr. In both cases, the time to compute the promise is very small (0.01s, as measured with Sys.time within the API). I thus conclude the delay comes from the promise preparation.

The delay seems connected to the R packages used within the promise. I compared 3 APIs: API1 uses the terra package that was loaded previously. API2 uses the terra package with terra::rast. API3 does not use terra but has a Sys.sleep() corresponding to the computing time of the terra function I was using. API1 and API2 both take 3 seconds to compute but with a difference in the promise time (API1 computes the promise in 0.01s and API2 takes 2.4s). API3 is much quicker with 0.7s to compute (including 0.03s for the promise). It thus seems that the delay comes from dealing with packages (I had something similar with sf package) and that loading the package previously (library(terra)) or calling them for specific use (terra::rast()) does not make a difference. Is there a way to reduce this delay?

Here is the reprex:

Script to call the APIs (you can choose to comment L3 or L4 to test with multisession or callr):

### Set the asynchronous coding
library(promises) ; library(future) ; library(future.callr)
#future::plan("multisession")
future::plan(future.callr::callr)

### Plumber app
library(plumber)

### Other libraries
library(terra)

### Measure time to compute the terra function
T1_delay=Sys.time() ; terra_delay=terra::rast(xmin=-10, xmax=10, ymin=-10, ymax=10, resolution=10, crs="+init=epsg:4326") ; Delay=Sys.time()-T1_delay

### Start app 
pr <- pr("Test_callr_APIs.R")
pr %>% pr_run()

Code to save in "Test_callr_APIs.R":

#* API1
#* @get species/<scientific_name>/API1
#* @param scientific_name:string Scientific Name
#* @serializer unboxedJSON
#* @tag test
function(scientific_name) {
  
  Prom<-future({
    T1<-Sys.time()
    
    raster_test<-rast(xmin=-10, xmax=10, ymin=-10, ymax=10, resolution=10, crs="+init=epsg:4326")

    cat(Sys.time()-T1, "\n")
    return(list(object_to_return=2))
    
  }, gc=T, seed=T)
  
  return(Prom)
  
}


#* API2
#* @get species/<scientific_name>/API2
#* @param scientific_name:string Scientific Name
#* @serializer unboxedJSON
#* @tag test
function(scientific_name) {
  
  Prom<-future({
    T1<-Sys.time()
    
    raster_test<-terra::rast(xmin=-10, xmax=10, ymin=-10, ymax=10, resolution=10, crs="+init=epsg:4326")
    
    cat(Sys.time()-T1, "\n")
    return(list(object_to_return=2))
    
  }, gc=T, seed=T)
  
  return(Prom)
  
}


#* API3
#* @get species/<scientific_name>/API3
#* @param scientific_name:string Scientific Name
#* @serializer unboxedJSON
#* @tag test
function(scientific_name) {
  
  Prom<-future({
    T1<-Sys.time()
    
    Sys.sleep(Delay)  
    
    cat(Sys.time()-T1, "\n")
    return(list(object_to_return=2))
    
  }, gc=T, seed=T)
  
  return(Prom)
  
}

Solution

  • The added latency you observe comes from future.callr::callr launching a fresh R process each time (using callr::r_bg()) plus loading all required packages in that new process. That takes time, and there's not much to do about it. The only thing I can imagine is to see if callr::r_bg() can be made faster (I doubt it) and the startup process of R and Rscript itself (something for the R-devel mailing list).