I am currently exploring elasticsearch in python using the elasticsearch_dsl
library. I am aware that my Elasticsearch knowledge is currently limited.
I have created a model like so:
class Post(InnerDoc):
text = Text()
id = Integer()
class User(Document):
name = Text()
posts = Object(doc_class=Posts)
signed_up_at = Date()
The data for posts is an array like this:
[
{
"text": "Test",
"id": 2
},
]
Storing my posts works. However, to me this seems wrong. I specify the "posts" attribute to be a Post - not a List of Posts.
Querying works, I can:
s = Search(using=client).query("match", posts__text="test")
and will retrieve the User that has a post containing the words as a result. What I want is that I get the user + all Posts that qualified the user to appear in the result (meaning all posts containing the search phrase). I called that the inner hits, but I am not sure if this is correct.
Help would be highly appreciated!
I tried using "nested" instead of "match" for the query, but that does not work:
[nested] query does not support [posts]
I suspect that this has to do with the fact that my index is specified incorrectly.
I updated my model to this:
class Post(InnerDoc):
text = Text(analyzer="snowball")
id = Integer()
class User(Document):
name = Text()
posts = Nested(doc_class=Posts)
signed_up_at = Date()
This allows me to do the following query:
GET users/_search
{
"query": {
"nested": {
"path": "posts",
"query": {
"match": {
"posts.text": "idea"
}
},
"inner_hits": {}
}
}
}
This translates to the following elasticsearch-dsl query in python:
s = (
Search(using=client).query(
"nested",
path="posts",
query=Q("term", **{"post.text": "Idea"}),
inner_hits={},
)
Access inner hits like this:
Using Nested might be required, because of how elasticsearch represents objects internally (https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html). As lists of objects might be flattened, it might not allow to retrieve complete inner hits that contain the correct association of text and id for a post.