rdatabasehttr

Make an API request in R


First time trying to do database stuff in R. I want to access a weather database to get weather data. Was following this guide https://www.geeksforgeeks.org/r-language/access-collect-data-with-apis-in-r/

but I have problems bulilding the query in the GET() function.

Database gives hints about what it expects to see and this is the filter part to target one single .h5 file :

    {
  "filter": {
    "and": {
      "children": [
        {
          "isEqual": {
            "field": "$contentType",
            "value": "0102FB01"
          }
        },
        {
          "and": {
            "children": [
              {
                "isEqual": {
                  "field": "Phenomenon",
                  "value": "COMP"
                }
              },
              {
                "contains": {
                  "field": "RMTitle",
                  "value": "comp.20250727100007.CAP.2567.h5"
                }
              }
            ]
          }
        }
      ]
    }
  }
}

If someone wants to try, the database is this:

url <- "https://avaandmed.keskkonnaportaal.ee/_vti_bin/RmApi.svc/archive"

So far I have tried:

params <- list(
  "$contentType" = "0102FB01",
  "Phenomenon" = "COMP",
  "RMTitle" = "comp.20250727100007.CAP.2567.h5"
)

result1 <- GET(url, query = params)

result1 Response [https://avaandmed.keskkonnaportaal.ee/_vti_bin/RmApi.svc/archive?%24contentType=0102FB01&Phenomenon=COMP&RMTitle=comp.20250727100007.CAP.2567.h5] Date: 2025-07-28 18:07 Status: 200 Content-Type: application/json; charset=utf-8 Size: 35.1 kB

Something is not working correctly. The actual file is 388 kB but response is only 35.1 kB


Solution

  • Not quite sure why you chose /_vti_bin/RmApi.svc/archive for the endpoint. This JSON filter is probably generated through KAIA service ( show API links / näita veebiteenuse linke ), it should also guide you to use specific endpoints (and different methods, GET() will not work):

    POST request URL for getting metadata: https://avaandmed.keskkonnaportaal.ee/_vti_bin/RmApi.svc/active/items/query

    POST request URL for getting zipped files: https://avaandmed.keskkonnaportaal.ee/_vti_bin/RmApi.svc/active/items/zipped-files

    Filter for API requests:
    /.../

    That generated filter can be simplified a bit, but it still needs to be a nested structure where each (nested) JSON object literal is translated to (nested) R list. To get equivalent R list, you could start with
    jsonlite::fromJSON(json_filter_string, simplifyVector = FALSE) |> dput()
    and modify it as you see fit. We could also build a small helper function to generate that filter:

    kaia_filter <- function(title_contains, phen = "COMP", ctype = "0102FB01") {
      # simplified structure, without nested and{}s
      list(filter = list(and = list(children = list(
        list(isEqual  = list(field = "$contentType",value = ctype)),
        list(isEqual  = list(field = "Phenomenon"  ,value = phen)),
        list(contains = list(field = "RMTitle"     ,value = title_contains))
      ))))
    }
    

    httr

    POST request to download an archive with matching h5 file(s) would then look something like this:

    httr::POST(
      "https://avaandmed.keskkonnaportaal.ee/_vti_bin/RmApi.svc/active/items/zipped-files",
      body = kaia_filter(title_contains = "comp.20250727100007.CAP.2567.h5"), 
      encode = "json",
      httr::write_disk("zippedFiles_httr.zip")
    )
    #> Response [https://avaandmed.keskkonnaportaal.ee/_vti_bin/RmApi.svc/active/items/zipped-files]
    #>   Date: 2025-07-28 21:09
    #>   Status: 200
    #>   Content-Type: application/zip
    #>   Size: 376 kB
    #> <ON DISK>  \Temp\Rtmp46wXUe\reprex-6cf43ddd6a8a-ivory-pewee\zippedFiles_httr.zip
    
    unzip("zippedFiles_httr.zip", list = TRUE)
    #>                              Name Length                Date
    #> 1 comp.20250727100007.CAP.2567.h5 396521 2025-07-29 00:09:00
    

    httr2

    Note that current {httr} development status is superseded. While it's not going away any time soon, they do recommend using {httr2} instead. It might be somewhat more verbose when making simple requests, but it's also easier to debug and comes with convenient tools for bit more advanced workflows (*).

    Same single request with httr2:

    library(httr2)
    
    request("https://avaandmed.keskkonnaportaal.ee/_vti_bin/RmApi.svc/active/items/zipped-files") |> 
      req_body_json(
        kaia_filter(title_contains = "comp.20250727100007.CAP.2567.h5")
      ) |> 
      req_perform(path = "zippedFiles_httr2.zip")
    #> <httr2_response>
    #> POST https://avaandmed.keskkonnaportaal.ee/_vti_bin/RmApi.svc/active/items/zipped-files
    #> Status: 200 OK
    #> Content-Type: application/zip
    #> Body: On disk 'zippedFiles_httr2.zip' (376115 bytes)
        
    unzip("zippedFiles_httr2.zip", list = TRUE)
    #>                              Name Length                Date
    #> 1 comp.20250727100007.CAP.2567.h5 396521 2025-07-29 00:09:00
    

    *) httr2 iterators and mutlti-responses

    To gather some details from comments, here's an example httr2 session with a workflow that requires multiple requests, i.e. collecting meta-data from a filter that would return more results than would fit into a single response so we'd have to deal with paginated responses.

    Even though the upper pageSize limit appears to be user-controlled in this case, pagination might still be required as the poor service might rather die trying -- when one attempts to get meta-data for files matching "comp.2025" (60000+ matches as of now) in a single response, it currently results with error 500 and "SqlException: The query processor ran out of internal resources and could not produce a query plan /.../" error message.

    Meta-data request documentation is bit sparse, but mock examples on API page provide few missing bits:

    We'll use the same filter function (as title filter is simple yet practical) with "comp.202507" to retrieve meta-data for all composite files from July 2025; there are 8000+ matches, so with pageSize = 4000 it should take 3 requests. Some used htrr2 functions are optional in this case (req_template, req_dry_run, ..), just to encourage to go through package doc index to see if it might provide something for any particular problem / use case / pattern.

    Preparing 1st request
    library(httr2)
    
    # 1st meta-data request with added pageSize to control pagination
    req_meta <- 
      request("https://avaandmed.keskkonnaportaal.ee") |> 
      req_template("/_vti_bin/RmApi.svc/{list}/items/query", list = "active") |> 
      req_body_json(kaia_filter(title_contains = "comp.202507")) |> 
      req_body_json_modify(pageSize = 4000)
    
    # Check request without sending anything
    req_dry_run(req_meta) 
    #> POST /_vti_bin/RmApi.svc/active/items/query HTTP/1.1
    #> accept: */*
    #> accept-encoding: deflate, gzip
    #> content-length: 210
    #> content-type: application/json
    #> host: avaandmed.keskkonnaportaal.ee
    #> user-agent: httr2/1.2.1 r-curl/6.4.0 libcurl/8.14.1
    #> 
    #> {
    #>   "filter": {
    #>     "and": {
    #>       "children": [
    #>         {
    #>           "isEqual": {
    #>             "field": "$contentType",
    #>             "value": "0102FB01"
    #>           }
    #>         },
    #>         {
    #>           "isEqual": {
    #>             "field": "Phenomenon",
    #>             "value": "COMP"
    #>           }
    #>         },
    #>         {
    #>           "contains": {
    #>             "field": "RMTitle",
    #>             "value": "comp.202507"
    #>           }
    #>         }
    #>       ]
    #>     }
    #>   },
    #>   "pageSize": 4000
    #> }
    
    Iteration
    # Callback function for req_perform_iterative(), 
    # a slight modification of example in `?req_perform_iterative`.
    # It's called with previous response and next request.
    # It's purpose is to modify and return next req using data from previous resp,
    # also sets a stop condition (iterator stops when next_req() returns NULL)
    next_req <- function(resp, req) {
      # parse JSON response, extract nextBookmark & number of returned documents
      body <- resp_body_json(resp)
      bookmark <- body$nextBookmark
      n_doc <- length(body$documents)
      # Stop condition (response JSON did not include `documents` array or the array is empty)
      if (n_doc == 0)
        return(NULL)
      # Get number of total pages, 
      # configures progress bar and sets max number of requests
      signal_total_pages(ceiling(body$numFound / n_doc))
      # Modify next request to continue from `bookmark`
      req |> req_body_json_modify(bookmark = bookmark)
    }
    
    # Make as many requests as it takes, return a list of responses
    resps <- req_perform_iterative(req_meta, next_req = next_req)
    #> iterating ■■■■■■■■■■■■■■■■■■■■■ 67% | ETA: 3s
    
    resps
    #> [[1]]
    #> <httr2_response>
    #> POST https://avaandmed.keskkonnaportaal.ee/_vti_bin/RmApi.svc/active/items/query
    #> Status: 200 OK
    #> Content-Type: application/json
    #> Body: In memory (3096237 bytes)
    #> 
    #> [[2]]
    #> <httr2_response>
    #> POST https://avaandmed.keskkonnaportaal.ee/_vti_bin/RmApi.svc/active/items/query
    #> Status: 200 OK
    #> Content-Type: application/json
    #> Body: In memory (3096066 bytes)
    #> 
    #> [[3]]
    #> <httr2_response>
    #> POST https://avaandmed.keskkonnaportaal.ee/_vti_bin/RmApi.svc/active/items/query
    #> Status: 200 OK
    #> Content-Type: application/json
    #> Body: In memory (366180 bytes)
    
    Inspecting 1st response and next request
    resp_body_json(resps[[1]]) |> listviewer::jsonedit()
    

    listviewer::jsonedit() screenshot, captures part of JSON response

    Looks like something that jsonlite::fromJSON() could simplify and flatten:

    # resp_body_json() args are passed to fromJSON() 
    r1 <- resp_body_json(resps[[1]], simplifyVector = TRUE, flatten = TRUE)
    lobstr::tree(r1, max_depth = 1)
    #> <list>
    #> ├─numFound: 8473
    #> ├─nextBookmark: "AoEnNTEwNDQ5Mw=="
    #> ├─documents: S3<data.frame>...
    #> └─error: <NULL>
    pillar::glimpse(r1$documents)
    #> Rows: 4,000
    #> Columns: 24
    #> $ id                                                <int> 5026629, 5026647, 50…
    #> $ uniqueId                                          <chr> "cd209414-5fc6-45cf-…
    #> $ contentTypeId                                     <chr> "0102FB01", "0102FB0…
    #> $ fileMetadata                                      <lgl> NA, NA, NA, NA, NA, …
    #> $ metadata.FP.CreatedAt                             <chr> "2025-07-01T03:10:35…
    #> $ metadata.FP.CreatedBy                             <int> 1, 1, 1, 1, 1, 1, 1,…
    #> $ metadata.FP.ModifiedAt                            <chr> "2025-07-01T03:10:35…
    #> $ metadata.FP.ModifiedBy                            <int> 1, 1, 1, 1, 1, 1, 1,…
    #> $ metadata.RMUniqueID                               <chr> "cd209414-5fc6-45cf-…
    #> $ metadata.RMTitle                                  <chr> "comp.20250701000007…
    #> $ metadata.S3Key                                    <chr> "Radar/Product_data/…
    #> $ metadata.S3ETag                                   <chr> "\"dd86461a4413edbaa…
    #> $ metadata.RMNotes                                  <lgl> NA, NA, NA, NA, NA, …
    #> $ metadata.RMKeywords                               <lgl> NA, NA, NA, NA, NA, …
    #> $ metadata.RMStatus                                 <chr> "Captured", "Capture…
    #> $ metadata.RMFileType                               <chr> "h5", "h5", "h5", "h…
    #> $ metadata.RMFileSize                               <int> 125, 129, 125, 128, …
    #> $ metadata.Timestamp                                <chr> "2025-07-01T03:00:07…
    #> $ metadata.RadarStation                             <lgl> NA, NA, NA, NA, NA, …
    #> $ metadata.RadarDataType                            <chr> "CAP", "CAP", "CAP",…
    #> $ metadata.Radar                                    <chr> "comp", "comp", "com…
    #> $ metadata.Phenomenon                               <chr> "COMP", "COMP", "COM…
    #> $ metadata.RMUsersWhoSubscribeToDocumentFileChanges <lgl> NA, NA, NA, NA, NA, …
    #> $ metadata.RMIncludeForSending                      <lgl> NA, NA, NA, NA, NA, …
    

    We can also extract performed requests from responses list for debugging. Checking 2nd request, note how nextBookmark from previous response ended up as a bookmark value for this request:

    resps[[2]] |> resp_request() |> req_dry_run()
    #> POST /_vti_bin/RmApi.svc/active/items/query HTTP/1.1
    #> accept: */*
    #> accept-encoding: deflate, gzip
    #> content-length: 240
    #> content-type: application/json
    #> host: avaandmed.keskkonnaportaal.ee
    #> user-agent: httr2/1.2.1 r-curl/6.4.0 libcurl/8.14.1
    #> 
    #> {
    #>   "filter": {
    #>     "and": {
    #>       "children": [
    #>         {
    #>           "isEqual": {
    #>             "field": "$contentType",
    #>             "value": "0102FB01"
    #>           }
    #>         },
    #>         {
    #>           "isEqual": {
    #>             "field": "Phenomenon",
    #>             "value": "COMP"
    #>           }
    #>         },
    #>         {
    #>           "contains": {
    #>             "field": "RMTitle",
    #>             "value": "comp.202507"
    #>           }
    #>         }
    #>       ]
    #>     }
    #>   },
    #>   "pageSize": 4000,
    #>   "bookmark": "AoEnNTEwNDQ5Mw=="
    #> }
    
    Collecting data from all resps into a single frame

    Repeating previously tested data extraction on all responses, result is a frame with 8473 rows.

    # resps_data() takes a list of responses and a function that extracts 
    # data from a single response object
    meta_df <- 
      resps_data(
        resps,
        \(resp) resp_body_json(resp, simplifyVector = TRUE, flatten = TRUE)$documents
      ) |> 
      tibble::as_tibble()
    meta_df
    #> # A tibble: 8,473 × 24
    #>         id uniqueId             contentTypeId fileMetadata metadata.FP.CreatedAt
    #>      <int> <chr>                <chr>         <lgl>        <chr>                
    #>  1 5026629 cd209414-5fc6-45cf-… 0102FB01      NA           2025-07-01T03:10:35.…
    #>  2 5026647 5860f8e0-6a34-4470-… 0102FB01      NA           2025-07-01T03:15:35.…
    #>  3 5026667 054b3a92-5bd1-4ca1-… 0102FB01      NA           2025-07-01T03:20:35.…
    #>  4 5026685 6cfa007b-a10b-4637-… 0102FB01      NA           2025-07-01T03:25:35.…
    #>  5 5026705 55286282-0e83-45ed-… 0102FB01      NA           2025-07-01T03:30:35.…
    #>  6 5026723 4d7fe052-0024-4bed-… 0102FB01      NA           2025-07-01T03:35:35.…
    #>  7 5026743 be49ee59-b7ad-4175-… 0102FB01      NA           2025-07-01T03:40:35.…
    #>  8 5026761 739a21cb-fb1f-4923-… 0102FB01      NA           2025-07-01T03:45:36.…
    #>  9 5026783 ef21a0d6-0e65-4f88-… 0102FB01      NA           2025-07-01T03:50:36.…
    #> 10 5026801 a555666b-51c4-4136-… 0102FB01      NA           2025-07-01T03:55:36.…
    #> # ℹ 8,463 more rows
    #> # ℹ 19 more variables: metadata.FP.CreatedBy <int>,
    #> #   metadata.FP.ModifiedAt <chr>, metadata.FP.ModifiedBy <int>,
    #> #   metadata.RMUniqueID <chr>, metadata.RMTitle <chr>, metadata.S3Key <chr>,
    #> #   metadata.S3ETag <chr>, metadata.RMNotes <lgl>, metadata.RMKeywords <lgl>,
    #> #   metadata.RMStatus <chr>, metadata.RMFileType <chr>,
    #> #   metadata.RMFileSize <int>, metadata.Timestamp <chr>, …