solrhierarchysolr8

Solr search entire index but only return the lowest level descendents


I am completely lost at getting the Solr ecosystem under my belt...maybe because the data structure I am dealing with is fundamentally difficult for handling in Solr correctly. I am trying to index documents/entries of a hierarchical classification system (NAICS: https://www23.statcan.gc.ca/imdb/p3VD.pl?Function=getVD&TVD=1181553).

The structure is a such:

What I want is to index the entire structure in Solr (using whatever means is recommended, i.e. nested documents, or some other category/path variable solution etc) so that when a user searches and their search terms are too broad to match at lower levels of the structure and instead match higher up in the structure that all descendants are still matched and returned instead. For example, user searches 'oil seed farming' and a hit is returned for document representing 1111-Oil seed and Grain farming. What I want instead is to just return last leaf descendants of that entry (111110, 111111, 111120) as though they were matched in the first place. How does one accomplish this in Solr or what are the options? The ultimate goal is to filter the structure to lowest leafs only based on the user query.

edit: based on suggestions received this is the approach I worked out.

curl http://localhost:8983/solr/NAICS/query -d '{
  "query": "{!join from=ANCESTOR_PATH to=DESCENDANT_PATH}NAICS:1111",
  "facet": {
    "TREE_NODES": {
      "type": "query",
      "q": "LEVEL:5",
      "facet" : {
        "TREE": {
          "type": "terms",
          "field": "DESCENDANT_PATH",
          "limit":-1
        }
      }
    }

Solution

  • Index each lower leaf as a document. For each document, include all the terms in the parents, all the way up to the root. This will give you something like:

    {
      "id": "111110",
      "name": "Soybean Farming",
      "path": "11-Agriculture/111-Crop Production/1111-Oil seed and Grain farming/11111-D Soybean Farming",
      "categories": [
        "11-Agriculture",
        "111-Crop Production",
        "1111-Oil seed and Grain farming",
        "11111-D Soybean Farming"
      ]
    }
    

    This will allow you to search for any terms against the categories field, and if you use a string field (or a path hierarchy tokenizer field) for the path, you can also do exact matches to look up a hierarchy if you want.