Is it possible to perform a More Like This query (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html) on text inside a nested datatype (https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html)?
The document that I'd like to query (which I have no control over how it is formatted since the data is owned by another party) looks something like this:
{
"communicationType": "Email",
"timestamp": 1497633308917,
"textFields": [
{
"field": "Subject",
"text": "This is the subject of the email"
},
{
"field": "To",
"text": "to-email@domain.com"
},
{
"field": "Body",
"text": "This is the body of the email"
}
]
}
I would like perform a More Like This query on the body of the email. Before, the documents used to look like this:
{
"communicationType": "Email",
"timestamp": 1497633308917,
"textFields": {
"subject": "This is the subject of the email",
"to: "to-email@domain.com",
"body": "This is the body of the email"
}
}
And I was able to perform a More Like This query on the email body like this:
{
"query": {
"more_like_this": {
"fields": ["textFields.body"],
"like": "This is a similar body of an email",
"min_term_freq": 1
},
"bool": {
"filter": [
{ "term": { "communicationType": "Email" } },
{ "range": { "timestamp": { "gte": 1497633300000 } } }
]
}
}
}
But now that data source has been deprecated, I need to be able to perform an equivalent query on the new data source that has the email body in the nested datatype. I only want to compare the text to the "text" fields that have a "header" of "Body".
Is this possible? And if so, how would the query look like? And would there be a major performance hit to perform the query on the nested datatype compared to before on the non-nested document? Even after applying the timestamp and communicationType filters, there will still be tens of millions of documents that each query would need to compare the like text against, so performance matters.
Actually, it turned out to be straightforward to use a More Like This query inside a nested query:
{
"query": {
"bool": {
"must": {
"nested": {
"path": "textFields",
"query": {
"bool": {
"must": {
"more_like_this": {
"fields": ["textFields.text"],
"like_text": "This is a similar body of an email",
"min_term_freq": 1
}
},
"filter": {
"term": { "textFields.field": "Body" }
}
}
}
}
},
"filter": [
{
"term": {
"communicationType": "Email"
}
},
{
"range": {
"timestamp": {
"gte": 1497633300000
}
}
}
]
}
},
"min_score": 2
}