pythonelasticsearchelasticsearch-dslelasticsearch-dsl-py

elasticsearch-dsl in python: How can I return the "inner hits" for my query?


I am currently exploring elasticsearch in python using the elasticsearch_dsl library. I am aware that my Elasticsearch knowledge is currently limited.

I have created a model like so:

class Post(InnerDoc):
    text = Text()
    id = Integer()


class User(Document):
    name = Text()
    posts = Object(doc_class=Posts)
    signed_up_at = Date()

The data for posts is an array like this:

[
 { 
    "text": "Test",
    "id": 2
 },
]

Storing my posts works. However, to me this seems wrong. I specify the "posts" attribute to be a Post - not a List of Posts.

Querying works, I can:

  s = Search(using=client).query("match", posts__text="test")

and will retrieve the User that has a post containing the words as a result. What I want is that I get the user + all Posts that qualified the user to appear in the result (meaning all posts containing the search phrase). I called that the inner hits, but I am not sure if this is correct.

Help would be highly appreciated!

I tried using "nested" instead of "match" for the query, but that does not work:

[nested] query does not support [posts]

I suspect that this has to do with the fact that my index is specified incorrectly.


Solution

  • I updated my model to this:

    class Post(InnerDoc):
        text = Text(analyzer="snowball")
        id = Integer()
    
    
    class User(Document):
        name = Text()
        posts = Nested(doc_class=Posts)
        signed_up_at = Date()
    

    This allows me to do the following query:

    GET users/_search
    {
      "query": {
        "nested": {
          "path": "posts",
          "query": {
            "match": {
              "posts.text": "idea"
            }
          },
          "inner_hits": {} 
        }
      }
    }
    

    This translates to the following elasticsearch-dsl query in python:

    s = (
        Search(using=client).query(
             "nested", 
             path="posts", 
             query=Q("term", **{"post.text": "Idea"}),
             inner_hits={},
            )
    

    Access inner hits like this:

    
    

    Using Nested might be required, because of how elasticsearch represents objects internally (https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html). As lists of objects might be flattened, it might not allow to retrieve complete inner hits that contain the correct association of text and id for a post.