First time trying to do database stuff in R. I want to access a weather database to get weather data. Was following this guide https://www.geeksforgeeks.org/r-language/access-collect-data-with-apis-in-r/
but I have problems bulilding the query in the GET() function.
Database gives hints about what it expects to see and this is the filter part to target one single .h5 file :
{
"filter": {
"and": {
"children": [
{
"isEqual": {
"field": "$contentType",
"value": "0102FB01"
}
},
{
"and": {
"children": [
{
"isEqual": {
"field": "Phenomenon",
"value": "COMP"
}
},
{
"contains": {
"field": "RMTitle",
"value": "comp.20250727100007.CAP.2567.h5"
}
}
]
}
}
]
}
}
}
If someone wants to try, the database is this:
url <- "https://avaandmed.keskkonnaportaal.ee/_vti_bin/RmApi.svc/archive"
So far I have tried:
params <- list(
"$contentType" = "0102FB01",
"Phenomenon" = "COMP",
"RMTitle" = "comp.20250727100007.CAP.2567.h5"
)
result1 <- GET(url, query = params)
result1 Response [https://avaandmed.keskkonnaportaal.ee/_vti_bin/RmApi.svc/archive?%24contentType=0102FB01&Phenomenon=COMP&RMTitle=comp.20250727100007.CAP.2567.h5] Date: 2025-07-28 18:07 Status: 200 Content-Type: application/json; charset=utf-8 Size: 35.1 kB
Something is not working correctly. The actual file is 388 kB but response is only 35.1 kB
Not quite sure why you chose /_vti_bin/RmApi.svc/archive
for the endpoint. This JSON filter is probably generated through KAIA service ( show API links / näita veebiteenuse linke ), it should also guide you to use specific endpoints (and different methods, GET()
will not work):
POST request URL for getting metadata: https://avaandmed.keskkonnaportaal.ee/_vti_bin/RmApi.svc/active/items/query
POST request URL for getting zipped files: https://avaandmed.keskkonnaportaal.ee/_vti_bin/RmApi.svc/active/items/zipped-files
Filter for API requests:
/.../
That generated filter can be simplified a bit, but it still needs to be a nested structure where each (nested) JSON object literal is translated to (nested) R list. To get equivalent R list, you could start with
jsonlite::fromJSON(json_filter_string, simplifyVector = FALSE) |> dput()
and modify it as you see fit. We could also build a small helper function to generate that filter:
kaia_filter <- function(title_contains, phen = "COMP", ctype = "0102FB01") {
# simplified structure, without nested and{}s
list(filter = list(and = list(children = list(
list(isEqual = list(field = "$contentType",value = ctype)),
list(isEqual = list(field = "Phenomenon" ,value = phen)),
list(contains = list(field = "RMTitle" ,value = title_contains))
))))
}
POST request to download an archive with matching h5 file(s) would then look something like this:
httr::POST(
"https://avaandmed.keskkonnaportaal.ee/_vti_bin/RmApi.svc/active/items/zipped-files",
body = kaia_filter(title_contains = "comp.20250727100007.CAP.2567.h5"),
encode = "json",
httr::write_disk("zippedFiles_httr.zip")
)
#> Response [https://avaandmed.keskkonnaportaal.ee/_vti_bin/RmApi.svc/active/items/zipped-files]
#> Date: 2025-07-28 21:09
#> Status: 200
#> Content-Type: application/zip
#> Size: 376 kB
#> <ON DISK> \Temp\Rtmp46wXUe\reprex-6cf43ddd6a8a-ivory-pewee\zippedFiles_httr.zip
unzip("zippedFiles_httr.zip", list = TRUE)
#> Name Length Date
#> 1 comp.20250727100007.CAP.2567.h5 396521 2025-07-29 00:09:00
Note that current {httr}
development status is superseded. While it's not going away any time soon, they do recommend using {httr2}
instead. It might be somewhat more verbose when making simple requests, but it's also easier to debug and comes with convenient tools for bit more advanced workflows (*).
Same single request with httr2
:
library(httr2)
request("https://avaandmed.keskkonnaportaal.ee/_vti_bin/RmApi.svc/active/items/zipped-files") |>
req_body_json(
kaia_filter(title_contains = "comp.20250727100007.CAP.2567.h5")
) |>
req_perform(path = "zippedFiles_httr2.zip")
#> <httr2_response>
#> POST https://avaandmed.keskkonnaportaal.ee/_vti_bin/RmApi.svc/active/items/zipped-files
#> Status: 200 OK
#> Content-Type: application/zip
#> Body: On disk 'zippedFiles_httr2.zip' (376115 bytes)
unzip("zippedFiles_httr2.zip", list = TRUE)
#> Name Length Date
#> 1 comp.20250727100007.CAP.2567.h5 396521 2025-07-29 00:09:00
To gather some details from comments, here's an example httr2
session with a workflow that requires multiple requests, i.e. collecting meta-data from a filter that would return more results than would fit into a single response so we'd have to deal with paginated responses.
Even though the upper pageSize limit appears to be user-controlled in this case, pagination might still be required as the poor service might rather die trying -- when one attempts to get meta-data for files matching "comp.2025"
(60000+ matches as of now) in a single response, it currently results with error 500 and "SqlException: The query processor ran out of internal resources and could not produce a query plan /.../" error message.
Meta-data request documentation is bit sparse, but mock examples on API page provide few missing bits:
pageSize
and bookmark
in request JSONnumFound
and nextBookmark
in response JSONWe'll use the same filter function (as title filter is simple yet practical) with "comp.202507"
to retrieve meta-data for all composite files from July 2025; there are 8000+ matches, so with pageSize = 4000
it should take 3 requests. Some used htrr2
functions are optional in this case (req_template
, req_dry_run
, ..), just to encourage to go through package doc index to see if it might provide something for any particular problem / use case / pattern.
library(httr2)
# 1st meta-data request with added pageSize to control pagination
req_meta <-
request("https://avaandmed.keskkonnaportaal.ee") |>
req_template("/_vti_bin/RmApi.svc/{list}/items/query", list = "active") |>
req_body_json(kaia_filter(title_contains = "comp.202507")) |>
req_body_json_modify(pageSize = 4000)
# Check request without sending anything
req_dry_run(req_meta)
#> POST /_vti_bin/RmApi.svc/active/items/query HTTP/1.1
#> accept: */*
#> accept-encoding: deflate, gzip
#> content-length: 210
#> content-type: application/json
#> host: avaandmed.keskkonnaportaal.ee
#> user-agent: httr2/1.2.1 r-curl/6.4.0 libcurl/8.14.1
#>
#> {
#> "filter": {
#> "and": {
#> "children": [
#> {
#> "isEqual": {
#> "field": "$contentType",
#> "value": "0102FB01"
#> }
#> },
#> {
#> "isEqual": {
#> "field": "Phenomenon",
#> "value": "COMP"
#> }
#> },
#> {
#> "contains": {
#> "field": "RMTitle",
#> "value": "comp.202507"
#> }
#> }
#> ]
#> }
#> },
#> "pageSize": 4000
#> }
# Callback function for req_perform_iterative(),
# a slight modification of example in `?req_perform_iterative`.
# It's called with previous response and next request.
# It's purpose is to modify and return next req using data from previous resp,
# also sets a stop condition (iterator stops when next_req() returns NULL)
next_req <- function(resp, req) {
# parse JSON response, extract nextBookmark & number of returned documents
body <- resp_body_json(resp)
bookmark <- body$nextBookmark
n_doc <- length(body$documents)
# Stop condition (response JSON did not include `documents` array or the array is empty)
if (n_doc == 0)
return(NULL)
# Get number of total pages,
# configures progress bar and sets max number of requests
signal_total_pages(ceiling(body$numFound / n_doc))
# Modify next request to continue from `bookmark`
req |> req_body_json_modify(bookmark = bookmark)
}
# Make as many requests as it takes, return a list of responses
resps <- req_perform_iterative(req_meta, next_req = next_req)
#> iterating ■■■■■■■■■■■■■■■■■■■■■ 67% | ETA: 3s
resps
#> [[1]]
#> <httr2_response>
#> POST https://avaandmed.keskkonnaportaal.ee/_vti_bin/RmApi.svc/active/items/query
#> Status: 200 OK
#> Content-Type: application/json
#> Body: In memory (3096237 bytes)
#>
#> [[2]]
#> <httr2_response>
#> POST https://avaandmed.keskkonnaportaal.ee/_vti_bin/RmApi.svc/active/items/query
#> Status: 200 OK
#> Content-Type: application/json
#> Body: In memory (3096066 bytes)
#>
#> [[3]]
#> <httr2_response>
#> POST https://avaandmed.keskkonnaportaal.ee/_vti_bin/RmApi.svc/active/items/query
#> Status: 200 OK
#> Content-Type: application/json
#> Body: In memory (366180 bytes)
resp_body_json(resps[[1]]) |> listviewer::jsonedit()
Looks like something that jsonlite::fromJSON()
could simplify and flatten:
# resp_body_json() args are passed to fromJSON()
r1 <- resp_body_json(resps[[1]], simplifyVector = TRUE, flatten = TRUE)
lobstr::tree(r1, max_depth = 1)
#> <list>
#> ├─numFound: 8473
#> ├─nextBookmark: "AoEnNTEwNDQ5Mw=="
#> ├─documents: S3<data.frame>...
#> └─error: <NULL>
pillar::glimpse(r1$documents)
#> Rows: 4,000
#> Columns: 24
#> $ id <int> 5026629, 5026647, 50…
#> $ uniqueId <chr> "cd209414-5fc6-45cf-…
#> $ contentTypeId <chr> "0102FB01", "0102FB0…
#> $ fileMetadata <lgl> NA, NA, NA, NA, NA, …
#> $ metadata.FP.CreatedAt <chr> "2025-07-01T03:10:35…
#> $ metadata.FP.CreatedBy <int> 1, 1, 1, 1, 1, 1, 1,…
#> $ metadata.FP.ModifiedAt <chr> "2025-07-01T03:10:35…
#> $ metadata.FP.ModifiedBy <int> 1, 1, 1, 1, 1, 1, 1,…
#> $ metadata.RMUniqueID <chr> "cd209414-5fc6-45cf-…
#> $ metadata.RMTitle <chr> "comp.20250701000007…
#> $ metadata.S3Key <chr> "Radar/Product_data/…
#> $ metadata.S3ETag <chr> "\"dd86461a4413edbaa…
#> $ metadata.RMNotes <lgl> NA, NA, NA, NA, NA, …
#> $ metadata.RMKeywords <lgl> NA, NA, NA, NA, NA, …
#> $ metadata.RMStatus <chr> "Captured", "Capture…
#> $ metadata.RMFileType <chr> "h5", "h5", "h5", "h…
#> $ metadata.RMFileSize <int> 125, 129, 125, 128, …
#> $ metadata.Timestamp <chr> "2025-07-01T03:00:07…
#> $ metadata.RadarStation <lgl> NA, NA, NA, NA, NA, …
#> $ metadata.RadarDataType <chr> "CAP", "CAP", "CAP",…
#> $ metadata.Radar <chr> "comp", "comp", "com…
#> $ metadata.Phenomenon <chr> "COMP", "COMP", "COM…
#> $ metadata.RMUsersWhoSubscribeToDocumentFileChanges <lgl> NA, NA, NA, NA, NA, …
#> $ metadata.RMIncludeForSending <lgl> NA, NA, NA, NA, NA, …
We can also extract performed requests from responses list for debugging. Checking 2nd request, note how nextBookmark
from previous response ended up as a bookmark
value for this request:
resps[[2]] |> resp_request() |> req_dry_run()
#> POST /_vti_bin/RmApi.svc/active/items/query HTTP/1.1
#> accept: */*
#> accept-encoding: deflate, gzip
#> content-length: 240
#> content-type: application/json
#> host: avaandmed.keskkonnaportaal.ee
#> user-agent: httr2/1.2.1 r-curl/6.4.0 libcurl/8.14.1
#>
#> {
#> "filter": {
#> "and": {
#> "children": [
#> {
#> "isEqual": {
#> "field": "$contentType",
#> "value": "0102FB01"
#> }
#> },
#> {
#> "isEqual": {
#> "field": "Phenomenon",
#> "value": "COMP"
#> }
#> },
#> {
#> "contains": {
#> "field": "RMTitle",
#> "value": "comp.202507"
#> }
#> }
#> ]
#> }
#> },
#> "pageSize": 4000,
#> "bookmark": "AoEnNTEwNDQ5Mw=="
#> }
resps
into a single frameRepeating previously tested data extraction on all responses, result is a frame with 8473 rows.
# resps_data() takes a list of responses and a function that extracts
# data from a single response object
meta_df <-
resps_data(
resps,
\(resp) resp_body_json(resp, simplifyVector = TRUE, flatten = TRUE)$documents
) |>
tibble::as_tibble()
meta_df
#> # A tibble: 8,473 × 24
#> id uniqueId contentTypeId fileMetadata metadata.FP.CreatedAt
#> <int> <chr> <chr> <lgl> <chr>
#> 1 5026629 cd209414-5fc6-45cf-… 0102FB01 NA 2025-07-01T03:10:35.…
#> 2 5026647 5860f8e0-6a34-4470-… 0102FB01 NA 2025-07-01T03:15:35.…
#> 3 5026667 054b3a92-5bd1-4ca1-… 0102FB01 NA 2025-07-01T03:20:35.…
#> 4 5026685 6cfa007b-a10b-4637-… 0102FB01 NA 2025-07-01T03:25:35.…
#> 5 5026705 55286282-0e83-45ed-… 0102FB01 NA 2025-07-01T03:30:35.…
#> 6 5026723 4d7fe052-0024-4bed-… 0102FB01 NA 2025-07-01T03:35:35.…
#> 7 5026743 be49ee59-b7ad-4175-… 0102FB01 NA 2025-07-01T03:40:35.…
#> 8 5026761 739a21cb-fb1f-4923-… 0102FB01 NA 2025-07-01T03:45:36.…
#> 9 5026783 ef21a0d6-0e65-4f88-… 0102FB01 NA 2025-07-01T03:50:36.…
#> 10 5026801 a555666b-51c4-4136-… 0102FB01 NA 2025-07-01T03:55:36.…
#> # ℹ 8,463 more rows
#> # ℹ 19 more variables: metadata.FP.CreatedBy <int>,
#> # metadata.FP.ModifiedAt <chr>, metadata.FP.ModifiedBy <int>,
#> # metadata.RMUniqueID <chr>, metadata.RMTitle <chr>, metadata.S3Key <chr>,
#> # metadata.S3ETag <chr>, metadata.RMNotes <lgl>, metadata.RMKeywords <lgl>,
#> # metadata.RMStatus <chr>, metadata.RMFileType <chr>,
#> # metadata.RMFileSize <int>, metadata.Timestamp <chr>, …