elasticsearchelasticsearch-queryelasticsearch-painlesselasticsearch-scripting

Select documents by array of objects when at least one object doesn't contain necessary field Elasticsearch


I have documents in the elasticsearch and can't understand how to apply search script that should return documents if any attachment doesn't contain uuid or uuid is null. Version of elastic 5.2. Mapping of documents

"mappings": {
    "documentType": {
        "properties": {
            "attachment": {
                "properties": {
                    "uuid": {
                        "type": "text"
                    },
                    "path": {
                        "type": "text"
                    },
                    "size": {
                        "type": "long"
                    }
                }
            }}}

In the elasticsearch it looks like

{
        "_index": "documents",
        "_type": "documentType",
        "_id": "1",
        "_score": 1.0,
        "_source": {
          "attachment": [
               {
                "uuid": "21321321",
                "path": "../uploads/somepath",
                "size":1231
               },
               {
                "path": "../uploads/somepath",
                "size":1231
               },      
         ]},
{
        "_index": "documents",
        "_type": "documentType",
        "_id": "2",
        "_score": 1.0,
        "_source": {
          "attachment": [
               {
                "uuid": "223645641321321",
                "path": "../uploads/somepath",
                "size":1231
               },
               {
                "uuid": "22341424321321",
                "path": "../uploads/somepath",
                "size":1231
               },        
         ]},
{
        "_index": "documents",
        "_type": "documentType",
        "_id": "3",
        "_score": 1.0,
        "_source": {
          "attachment": [
               {
                "uuid": "22789789341321321",
                "path": "../uploads/somepath",
                "size":1231
               }, 
               {
                "path": "../uploads/somepath",
                "size":1231
               },      
         ]}

As result I want to get attachments with _id 1 and 3. But as result I get error of the script I tried to apply next script:

{
    "query": {
        "bool": {
            "must": [
                {
                    "exists": {
                        "field": "attachment"
                    }
                },
                {
                    "script": {
                        "script": {
                            "inline": "for (item in doc['attachment'].value) { if (item['uuid'] == null) { return true}}",
                            "lang": "painless"
                        }
                    }
                }
            ]
        }
    }
}

Error is next:

 "root_cause": [
            {
                "type": "script_exception",
                "reason": "runtime error",
                "script_stack": [
                    "org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:77)",
                    "org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:36)",
                    "for (item in doc['attachment'].value) { ",
                    "                 ^---- HERE"
                ],
                "script": "for (item in doc['attachment'].value) { if (item['uuid'] == null) { return true}}",
                "lang": "painless"
            }
        ],

Is it possible to select documents in case even one attachment object doesn't contain uuid ?


Solution

  • Iterating arrays of objects is not as trivial as one would expect. I've written extensively about it here and here.

    Since your attachments are not defined as nested, ES will internally represent them as flattened lists of values (also called "doc values"). For instance attachment.uuid in doc#2 will become ["223645641321321", "22341424321321"], and attachments.size will turn into [1231, 1231].

    This means that you can simply compare the .length of these flattened representations! I assume attachment.size will always be present and can be thus taken as the comparison baseline.

    One more thing. To take advantage of these optimized doc values for textual fields, it'll require one small mapping change:

    PUT documents/documentType/_mappings
    {
      "properties": {
        "attachment": {
          "properties": {
            "uuid": {
              "type": "text",
              "fielddata": true     <---
            },
            "path": {
              "type": "text"
            },
            "size": {
              "type": "long"
            }
          }
        }
      }
    }
    

    When that's done and you've reindexed your docs — which can be done with this little Update by query trick:

    POST documents/_update_by_query
    

    You can then use the following script query:

    POST documents/_search
    {
      "query": {
        "bool": {
          "must": [
            {
              "exists": {
                "field": "attachment"
              }
            },
            {
              "script": {
                "script": {
                  "inline": "def size_field_length = doc['attachment.size'].length; def uuid_field_length =  doc['attachment.uuid'].length; return uuid_field_length < size_field_length",
                  "lang": "painless"
                }
              }
            }
          ]
        }
      }
    }