[SOLVED] Optimize the Gremlin Query running on Cosmos DB

Optimize the Gremlin Query running on Cosmos DB

I am creating a PoC using Azure Cosmos DB for Apache Gremlin to model and query/display the Organization Chart. Given a node, the service needs to traverse the graph and pull all children recursively so the UI can display the Organization Chart. For each traversed node, I need to get few attributes e.g. Name, ImageUrl, Direct Manager etc. I was able to put together a query that gives me the desired data but it seems to be very inefficient. It works but as I go up the hierarchy, it starts to timeout. Currently, I don't even have a very large dataset, just about a sample organization with ~500 employees and about 5 levels in handful of paths. The UI component requires the data to be flat i.e. id, name, ..., parentId.

Here is how the model looks like

Here is the query that is being used

g.V('rootUserId')
    .emit()
    .repeat(out('manages'))
    .until(__.not(outE('manages')))
    .path()
    .by(project('orgUser', 'reportsTo')
        .by(valueMap(true))
        .by(out('reportsTo')
            .valueMap(true)
            .dedup()
            .fold()))
    .unfold()
    .dedup()

How can this be optimized? Needless to say, I don't expect this to execute in few seconds. Since this data is slow changing, the queried data would be cached but I need to be able to query it given a root user which could be at level 2 or level 10 (based on UI selection). CosmosDB has a hard limit on execution time (30 seconds as per documentation but I am observing 60 seconds). Eventually, I would like this to scale to a few thousands employees organization. If this model and/or querying method is not the right approach, what is the recommended approach?

Solution

Simplified the data model by storing the "managerId" in each of the employee node thus removing the need to traverse back to manager's node during traversal. With that the query also got reduced to

g.V('{rootUserId}')
    .emit()
    .repeat(out('manages'))
    .until(__.not(outE('manages')))
    .valueMap(true)
    .dedup()

This executes within 4-5 seconds for about ~3000 nodes dataset.