rropensci

Wait for rgbif download to complete before proceeding


I am developing a small application in R Shiny. Part of the application will need to query GBIF to download species occurrence data. This is possible using rgbif. The function rgbif::occ_download() will download the data and rgbif::occ_download_meta() will check whether GBIF has fulfilled your request. For example:

geometry <- "POLYGON((30.1 10.1,40 40,20 40,10 20,30.1 10.1))"
res <- occ_download(paste0("geometry within ", geometry), type = "within", format = "SPECIES_LIST")
occ_download_meta(res)

<<gbif download metadata>>
  Status: RUNNING
  Format: SPECIES_LIST
  Download key: 0004089-190415153152247
  Created: 2019-04-25T09:18:20.952+0000
  Modified: 2019-04-25T09:18:21.045+0000
  Download link: http://api.gbif.org/v1/occurrence/download/request/0004089-190415153152247.zip
  Total records: 0

So far, so good. However, the following function rgbif::occ_download_get() can't download the data for downstream analysis until occ_download_meta(res) has completed (when Status = SUCCEEDED).

How can I make the session wait until the download from GBIF has been completed? I cannot hard code a wait time into the script as different sized extents will take GBIF longer or shorter amounts of time to process. Also, the number of other active users querying the service could also alter wait times. I therefore need some sort of flag where Status == Succeeded before proceeding.

I have copied some skeleton code with comments below.

library(rgbif)

geometry <- "POLYGON((30.1 10.1,40 40,20 40,10 20,30.1 10.1))" # Define boundary
res <- occ_download(paste0("geometry within ", geometry), type = "within", format = "SPECIES_LIST")

# WAIT HERE UNTIL Status == SUCCEEDED
occ_download_meta(res)

x <- occ_download_get(res, overwrite = TRUE) # Download data 
data<-occ_download_import(x) # Import into R


Solution

  • rgbif maintainer here. You could do something like we have within the occ_download_queue() function:

    res <- occ_download(paste0("geometry within ", geometry), type = "within", format = "SPECIES_LIST")
    still_running <- TRUE
    status_ping <- 3
    while (still_running) {
      meta <- occ_download_meta(res)
      status <- meta$status
      still_running <- status %in% c("succeeded", "killed")
      Sys.sleep(status_ping) # sleep between pings
    }
    

    you probably want to check for succeeded and killed, and do something different if killed