restelasticsearchdsl

Query a string on array element in api with url enpoint


I'm using the Art Institute of Chicago API (https://api.artic.edu/docs/#introduction) and there is an element called subject_titles which is an array of strings. I want to query the API to show me all the results which contain the string "landscapes" in the subject_titles, rather than scrape the API and search for the string on my end.

Some failed examples of what I have tried:

https://api.artic.edu/api/v1/artworks/search?q=[subject_titles]=landscapes

https://api.artic.edu/api/v1/artworks/search?query[terms][subject_titles]=landscape

I reckon it would be replacing '[terms]' with a different specifier, but I can't find which. All my research comes up with results that use the Elasticsearch API, but I'm pretty new to this and that seems like a can of worms I don't want to open (why do I need one API to query another API? Also DSL looks like a headache to learn synatx-wise), but I will learn it if I have to. Is there a way to do this using the simple REST style url endpoint?


Solution

  • TL;DR: https://api.artic.edu/api/v1/artworks/search?query[match][subject_titles]=landscape

    If you want an more detailed explanation, I think they tried to make the interface powerful and concise so you can do structured queries with just an url, but I agree, it is a bit confusing.

    It looks like the URL parameters are getting translated into top level elements in the DSL and if values start with something like foo[bar] they are getting translated into foo with bar nested inside. So if you have foo[bar][baz]=10 it will be translated into

    {
      "foo": {
        "bar": {
         "baz": 10
        }
      }
    }
    

    With this information in mind we can reverse engineer query[term][is_public_domain]=true into

    {
      "query": {
        "term": {
          "is_public_domain": true
        }
      }
    }
    

    If we now open elasticsearch documentation we can figure out that term is the type of the query and this query will search all documents were the field is_public_domain contains true. We need to search for another field and another value. So, if we replace is_public_domain with subject_titles and true with landscape. Term works well for boolean fields such as is_public_domain but it is better to search strings with another query type - match. So we should also replace term with match. At the end we will get the following query:

    {
      "query": {
        "match": {
          "subject_titles": "landscape"
        }
      }
    }
    

    Now we can convert it back into URL representation: query[match][subject_titles]=landscape and if we stick it back on the URL we get

    https://api.artic.edu/api/v1/artworks/search?query[match][subject_titles]=landscape

    This will give us the first 10 hits. If we want more, we can add limit:

    https://api.artic.edu/api/v1/artworks/search?query[match][subject_titles]=landscape&limit=100

    and if we want even more we can start paging through the results using the page parameter

    https://api.artic.edu/api/v1/artworks/search?query[match][subject_titles]=landscape&limit=100&page=2