elasticsearch

Can I filter an array in elastic?


I had to insert a huge amount of data into elastic and I have done it in the following manner. I need to query this object but I am unable to filter the "logData" array. Can someone help me out here ? is it even possible to filter an array in elastic?

"_source": {
  "FileName": "fileName.log"
  "logData": [
    {
       "LineNumber": 1,
       "Data": "data1"
    },
    {
       "LineNumber": 2,
       "Data": "Data2"
    },
    {
       "LineNumber": 3,
       "Data": "Data3"
    },
    {
       "LineNumber": 4,
       "Data": "Data4"
    },
    {
       "LineNumber": 5,
       "Data": "Data5"
    },
    {
       "LineNumber": 6,
       "Data": "Data6"
    }
  ]
}

Is there a way to query such that I get only few items from this array ? like:

"_source": {
  "FileName": "fileName.log"
  "logData": [
    {
       "LineNumber": 1,
       "Data": "data1"
    },
    {
       "LineNumber": 2,
       "Data": "Data2"
    },
    {
       "LineNumber": 3,
       "Data": "Data3"
    }
  ]
}

Solution

  • There's no dedicated array mapping type in ES.

    With that being said, when you have an array of objects with shared keys, it's recommended that you use the nested field type to preserve the connections of the individual sub-objects' attributes. If you don't use nested, the objects will be flattened which may lead to seemingly wrong query results.


    As to the actual query -- assuming your mapping looks something like this:

    PUT logs_index
    {
      "mappings": {
        "properties": {
          "logData": {
            "type": "nested"
          }
        }
      }
    }
    

    you'll need to filter those logData sub-documents of interest, perhaps with a terms_query. Then and only then can you extract only those array objects that've matched this query (lineNumber: 1 or 2 or 3).

    The technique for that is called inner_hits:

    POST logs/_search
    {
      "_source": ["FileName", "inner_hits.logData"],
      "query": {
        "nested": {
          "path": "logData",
          "query": {
            "terms": {
              "logData.LineNumber": [
                1,
                2,
                3
              ]
            }
          },
          "inner_hits": {}
        }
      }
    }
    

    Check this thread for more info.