I am looking for a way to implement a structural ordering for a search. I use Azure search and have indexes (simplified):
[
{
"id": Guid,
"name": string,
"folderId": Guid
}
]
name
field is the field I am executing the search queries against. And the folder - obviously, the folder the object lives in.
Suppose I have a folder structure:
[
{
"id": "a595885e-520e-4fd2-9bdd-3f494f187b2e",
"name": "folder1"
"searchObjects": [],
"folders": [
{
"id": "f760f2bd-7291-49ed-9be2-9546ce57fb87",
"name": "subfolder1",
"searchObjects": [],
"folders": []
}
]
},
{
"id": "200ff3b6-310a-49d1-ad99-aed6f34a8f38",
"name": "folder2",
"searchObjects": [],
"folders": []
}
]
And each of these folders has 3000 searchable objects. What I would like to achieve is I want to paginate the search results and retrieve these pages accordingly to the folders structure. For example, let's say I query 5000 objects with each request. In this case, I would get:
1 page - 3000 items from folder1 + 2000 items from subfolder1;
2 page - 1000 items from subfolder1 + 3000 items from folder2;
The initial thought was to calculate a certain folder index before putting the searchable objects into Azure Search. e.g. folder index:
[
{
"index": 1
"name": "folder1"
"folders": [
{
"index": 11,
"name": "subfolder1"
},
{
"index": 12,
"name": "subfolder2"
},
{
"index": 13,
"name": "subfolder3"
"folders": [
{
"index": 131,
"name": "subSubfolder1"
}
]
}
]
},
{
"index": 2
"name": "folder2"
"folders": [
{
"index": 21,
"name": "subfolder2"
}
]
}
]
Searchable objects:
[
{
"id": "3d4374ec-18a0-4e5b-bb55-e7576b475cdb",
"name": "this object is in folder1",
"folderIndex": 1
},
{
"id": "3d4374ec-18a0-4e5b-bb55-e7576b475cdb",
"name": "this object is in subSubfolder1",
"folderIndex": 131
},
{
"id": "2c2c02ec-3f57-4c85-886e-df6603718d44",
"name": "this object is in subfolder1",
"folderIndex": 11
},
...
]
This would allow me to search by the name and order by the folder structure:
search=this object&$top=5000&$searchFields=name&$orderby=folderIndex,name
When I put/change one or even a thousand of objects in a folder it works fine, I just index/reindex these objects on Azure Search side. But it doesn't work in scale. I may have hundreds of folders folded into each other and each of these folders may contain thousands of objects. So if I reorganize the folders it becomes a mess. I have to recalculate almost all of the objects starting from the top folder in the changing tree down to the bottom leaves.
This would be much easier with a relational structure where I could store folders with their indexes separately from the searchable objects, join them by folder IDs and order by the folder indexer all the same, but ...
Is there a way of doing this right?
Is the folder index being kept just for the reason of ordering the result set by folder path? If that's the case, why not keep full folder paths as a sortable field in the original index? This way you'll be able to order the result set by folder paths, assuming the folder path order you want is alphabetical.
For example:
Doc1: “field1”
Doc2: ”field1”
Doc3: “field1\subfield11\subfield111”
Doc4: ”field2”