elasticsearchnestedrange-query

Nested Query with Date Range


I'm wondering if somebody can confirm if the query I've constructed is correct.

I have a User mapping that has Transaction nested within the user and I'm looking for Users that bought a specific item before a date. Also, within a Transaction I also have line_items which is also a nested field.

Originally, I constructed the following query, and based on the data that was returned I've concluded the query is probably incorrect.

{
    "query": {
        "bool": {
            "must": [
                {
                    "nested": {
                        "path": "transaction",
                        "query": {
                            "nested": {
                                "path": "transaction.line_items",
                                "query": {
                                    "bool": {
                                        "must": {
                                            "match": {
                                                "transaction.line_items.barcode": {
                                                    "query": "abc123xyz"
                                                }
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                },
                {
                    "nested": {
                        "path": "transaction",
                        "query": {
                            "range": {
                                "transaction.timestamp": {
                                    "from": null,
                                    "include_lower": true,
                                    "include_upper": true,
                                    "to": "2021-05-14T00:00:00+02:00"
                                }
                            }
                        }
                    }
                }
            ]
        }
    }
}

I then updated my query and now based on the results returned I think the query is correct. However, in order to avoid confirmation bias, I'm wondering if somebody can explain why the second query is correct (assuming it is).

{
    "query": {
        "bool": {
            "must": [
                {
                    "nested": {
                        "path": "transaction",
                        "query": {
                            "bool": {
                                "must": [
                                    {
                                        "range": {
                                            "transaction.timestamp": {
                                                "from": null,
                                                "include_lower": true,
                                                "include_upper": true,
                                                "to": "2021-05-16T00:00:00+02:00"
                                            }
                                        }
                                    },
                                    {
                                        "nested": {
                                            "path": "transaction.line_items",
                                            "query": {
                                                "bool": {
                                                    "must": {
                                                        "match": {
                                                            "transaction.line_items.barcode": {
                                                                "query": "abc123xyz"
                                                            }
                                                        }
                                                    }
                                                }
                                            }
                                        }
                                    }
                                ]
                            }
                        }
                    }
                }
            ]
        }
    }
}

Solution

  • Every nested document in ES is stored as a separate document.

    Suppose you have a document- A - [b,d] as nested fields

    If according to your first attempt -> the date query matches only d, and the barcode query matches only b. Then A will be returned.

    But for your second attempt both the queries have to match for the same nested document and based on that only the document will be returned. And in our example. A will not be returned for the first attempt.