pythonarrayselasticsearch

elasticsearch query with Python lists returns "Can't get text on a START_ARRAY at 1:30"


I'm new with elastic (any also with python) but have some first success:

es = Elasticsearch(['http://localhost:9200'], basic_auth=('user', 'pw'))
query_body = {
  "query": {
      "match": {
          "content": "English"
      }
  }
}

response = es.search(index="resumes", body=query_body)

works fine at all. But I I have a list of keywords I need a way to use list input instead of string only.

I found this sample How to make elasticsearch query with Python lists and it seems to be simple, just change code to:

query_body = {
  "query": {
      "match": {
          "content": ["English", "German"]
      }
  }
}

as at the other threat used. But I get an exception: "Can't get text on a START_ARRAY at 1:30".

Some more research point out that there was a previous bug but this is out of date I expect. https://github.com/elastic/elasticsearch/issues/15741

I'm using elastic 8.15.1 - maybe somebody know a way how to make it work?

Thank you, Andre


Solution

  • Tldr;

    The match query does not accept arrays. There are ways around it.

    Demo

    Dataset

    POST 79536558/_bulk
    {"index":{}}
    {"content": "this is english"}
    {"index":{}}
    {"content": "das ist deutsch"}
    {"index":{}}
    {"content": "C'est du francais"}
    

    Solution 1 - boolean with should

    
    GET 79536558/_search
    {
      "query": {
        "bool": {
          "should": [
            {
              "match": {
                "content": "english"
              }
            },
            {
              "match": {
                "content": "deutsch"
              }
            }
          ]
        }
      }
    }
    

    Solution 2 - query string

    GET 79536558/_search
    {
      "query": {
        "query_string": {
          "default_field": "content",
          "query": "english OR deutsch"
        }
      }
    }