amazon-web-services gremlin amazon-neptune azure-cosmosdb-gremlinapi gremlinnet

Does partition strategy helps on Gremlin traversal performance

I tried to play around with the partition strategy as what was mentioned here https://tinkerpop.apache.org/docs/current/reference/ .Initially, I expect that when I define a specific partition key for a zone and write some vertices on it, it would index that specific zones and improve the vertex lookup. Eventually, I realize that the partition key is just like another property value define within a vertex. In other words, these codes is nothing more but just a property value lookup which leads to full graph traversal scan:

g.withStrategies(new PartitionStrategy(partitionKey: "_partition", writePartition: "a", 
                                     readPartitions: ["a"]));

I'm not sure what are the underlying logic of this partitionstrategy, but it does not seems to be improve the lookup if it really does full graph scan. Correct me if i;m wrong

Solution

From TinkerPop's perspective, PartitionStrategy is just automatically modifying your Gremlin to take advantage of particular property in the graph. TinkerPop doesn't know anything about your graph databases's underlying indexing features nor does it implement any. It is up to your graph to optimize such things. Some graphs might do that on their own, some might offer you the opportunity to create indices that would help improve the speed of PartitionStrategy and others might do nothing at all, leaving PartitionStrategy to not work well for all use cases.

Going back to TinkerPop's perspective, the goal of PartitionStrategy (and SubgraphStrategy for that matter) is more to ease the manner with which Gremlin is written for use cases where parts of the graph need to be hidden. Without it, you would have lots and lots of repetitive filters mixed into your traversal which would muddy its readability.

Consider this bit of code:

graph = TinkerGraph.open()
strategy = new PartitionStrategy(partitionKey: "_partition", writePartition: "a", readPartitions: ["a"])
g = traversal().withEmbedded(graph).withStrategies(strategy)
g.addV().addE('link')
g.V().out().out().out()

The traversal is quite readable and straightforward. It is easy to understand the intent - a three step hop. But that's not really the traversal that executed. What executed was:

g.V().out().has('_partition',within("a")).
  out().has('_partition',within("a")).
  out().has('_partition',within("a"))

If you are using PartitionStrategy then you need to be sure it suits your graph database as well as your use case.