azuresearchluceneazure-cognitive-searchazure-search-.net-sdk

Structural ordering for search


I am looking for a way to implement a structural ordering for a search. I use Azure search and have indexes (simplified):

[
    {
        "id": Guid,
        "name": string,
        "folderId": Guid
    }
]

name field is the field I am executing the search queries against. And the folder - obviously, the folder the object lives in. Suppose I have a folder structure:

[
    {
        "id": "a595885e-520e-4fd2-9bdd-3f494f187b2e",
        "name": "folder1"
        "searchObjects": [],
        "folders": [
            {
                "id": "f760f2bd-7291-49ed-9be2-9546ce57fb87",
                "name": "subfolder1",
                "searchObjects": [],
                "folders": []
            }
        ]
    },
    {
        "id": "200ff3b6-310a-49d1-ad99-aed6f34a8f38",
        "name": "folder2",
        "searchObjects": [],
        "folders": []
    }
]

And each of these folders has 3000 searchable objects. What I would like to achieve is I want to paginate the search results and retrieve these pages accordingly to the folders structure. For example, let's say I query 5000 objects with each request. In this case, I would get:

1 page - 3000 items from folder1 + 2000 items from subfolder1;

2 page - 1000 items from subfolder1 + 3000 items from folder2;

The initial thought was to calculate a certain folder index before putting the searchable objects into Azure Search. e.g. folder index:

[
    {
        "index": 1
        "name": "folder1"
        "folders": [
            {
                "index": 11,
                "name": "subfolder1"
            },
            {
                "index": 12,
                "name": "subfolder2"
            },
            {
                "index": 13,
                "name": "subfolder3"
                "folders": [
                    {
                        "index": 131,
                        "name": "subSubfolder1"
                    }
                ]
            }
        ]
    },
    {
        "index": 2
        "name": "folder2"
        "folders": [
            {
                "index": 21,
                "name": "subfolder2"
            }
        ]
    }
]

Searchable objects:

[
    {
        "id": "3d4374ec-18a0-4e5b-bb55-e7576b475cdb",
        "name": "this object is in folder1",
        "folderIndex": 1
    },
    {
        "id": "3d4374ec-18a0-4e5b-bb55-e7576b475cdb",
        "name": "this object is in subSubfolder1",
        "folderIndex": 131
    },
    {
        "id": "2c2c02ec-3f57-4c85-886e-df6603718d44",
        "name": "this object is in subfolder1",
        "folderIndex": 11
    },
    ...
]

This would allow me to search by the name and order by the folder structure:

search=this object&$top=5000&$searchFields=name&$orderby=folderIndex,name

When I put/change one or even a thousand of objects in a folder it works fine, I just index/reindex these objects on Azure Search side. But it doesn't work in scale. I may have hundreds of folders folded into each other and each of these folders may contain thousands of objects. So if I reorganize the folders it becomes a mess. I have to recalculate almost all of the objects starting from the top folder in the changing tree down to the bottom leaves.

This would be much easier with a relational structure where I could store folders with their indexes separately from the searchable objects, join them by folder IDs and order by the folder indexer all the same, but ...

Is there a way of doing this right?


Solution

  • Is the folder index being kept just for the reason of ordering the result set by folder path? If that's the case, why not keep full folder paths as a sortable field in the original index? This way you'll be able to order the result set by folder paths, assuming the folder path order you want is alphabetical.

    For example:

    Doc1: “field1”

    Doc2: ”field1”

    Doc3: “field1\subfield11\subfield111”

    Doc4: ”field2”