solrlucenesolrjsolrcloud

Solr - Indexing Nested Child documents


I am trying a simple example of creating a schema for document with nested child documents.

{
    "id" : "467eead3-7562-41fb-a406-3b4e62bdff9f",
    "title": "my test ticket",
    "participants": [ 
        {
            "id" : "487b5309-8f1e-4608-8e91-35cb10d27cac",
            "name" : "jakub"
        },
        {
            "id" : "6ddf6b8b-476a-4b7c-b1d6-75324fed3a55",
            "name" : "kiran"
        }
    ]
}

When I input this into Solr schema designer in Solr admin it complains in many different ways about "misssing root field", then about invalid field "name".

When I go to the official documentation about indexing nested child documents https://solr.apache.org/guide/8_0/indexing-nested-documents.html and take their example:

{
      "id": "1",
      "title": "Solr adds block join support",
      "content_type": "parentDocument",
      "_childDocuments_": [{
              "id": "2",
              "comments": "SolrCloud supports it too!"
          }
      ]
}, {
      "id": "3",
      "title": "New Lucene and Solr release is out",
      "content_type": "parentDocument",
      "_childDocuments_": [{
              "id": "4",
              "comments": "Lots of new features"
          }
      ]
}

The schema designer does not even want to process such example: enter image description here

I am clueless how to approach this. I have a feeling I am understanding the whole concept wrongly. My question is - when indexing child documents the approach is:

  1. Do I define schema for parent and child document separately?
  2. How should the child collection look like when indexing such JSON document? Do I pass in IDs of the child documents? Would the field type be "strings" and marked as "multivalued"?
  3. Is it even possible to define nested child documents through the schema designer tool?

Solution

  • Worked it out at the end. What you need to do is flatten your schema. So your root document needs to contain the fields that the child document has so the designer can parse the document correctly.

    The working JSON would look like this:

    {
        "_root_": "467eead3-7562-41fb-a406-3b4e62bdff9f",
        "id": "467eead3-7562-41fb-a406-3b4e62bdff9f",
        "content_type": "ticket"
        "title": "my test ticket",
        "name": null,
        "participants": [{
                "id": "487b5309-8f1e-4608-8e91-35cb10d27cac",
                "content_type": "participant",
                "name": "jakub"
            }, {
                "id": "6ddf6b8b-476a-4b7c-b1d6-75324fed3a55",
                "content_type": "participant",
                "name": "kiran"
            }
        ]
    }