gremlin graph-databases tinkerpop azure-cosmosdb-gremlinapi gremlinnet

Need a query to retrieve complete graph

I am trying to retrieve all the node and properties details in parent-child hierarchy. Nested within each other. Since I am new with gremlin, graphDB I am having really tough time to get it done.

Please suggest a solution and if you could walk me through it, it will be great.

Following is my structure

And I am trying to keep the response as clean as possible. I am using cosmosDB and Gremlin. NET api for this.

I tried the following but it gave me response in key value, g.V("some_id").repeat(out()).emit().tree().path() g.V("some_id").emit().repeat(both().simplePath()).dedup()

please any kind of suggestion would be great.

Solution

I"m not sure what format you want your result, but use of path(), tree() or subgraph() would typically give you the graph structure. Since you are using CosmosDB, you're only options are path() and tree() as subgraph() does not appear to be supported.

Using this sample graph as a simple tree:

g.addV().property(id, '1').as('1').
  addV().property(id, '2a').as('2a').
  addV().property(id, '2b').as('2b').
  addV().property(id, '3a').as('3a').
  addV().property(id, '4a').as('4a').
  addE('child').from('1').to('2a').
  addE('child').from('1').to('2b').
  addE('child').from('2a').to('3a').
  addE('child').from('3a').to('4a')

you can see the effect of path() which basically gathers the contents of each step Gremlin took:

gremlin> g.V('1').repeat(out()).emit().path()
==>[v[1],v[2a]]
==>[v[1],v[2b]]
==>[v[1],v[2a],v[3a]]
==>[v[1],v[2a],v[3a],v[4a]]

Since I used out() we don't see the edges, but that is easily remedied by adding making a small adjustment to directly consume edges into the path history:

gremlin> g.V('1').repeat(outE().inV()).emit().path()
==>[v[1],e[0][1-child->2a],v[2a]]
==>[v[1],e[1][1-child->2b],v[2b]]
==>[v[1],e[0][1-child->2a],v[2a],e[2][2a-child->3a],v[3a]]
==>[v[1],e[0][1-child->2a],v[2a],e[2][2a-child->3a],v[3a],e[3][3a-child->4a],v[4a]]

Taken together with duplication removed on your application side you have a complete graph with path().

Replacing path() with tree() will essentially do that deduplication by maintaining the tree structure of the path history:

gremlin> g.V('1').repeat(out()).emit().tree()
==>[v[1]:[v[2b]:[],v[2a]:[v[3a]:[v[4a]:[]]]]]
gremlin> g.V('1').repeat(outE().inV()).emit().tree()
==>[v[1]:[e[0][1-child->2a]:[v[2a]:[e[2][2a-child->3a]:[v[3a]:[e[3][3a-child->4a]:[v[4a]:[]]]]]],e[1][1-child->2b]:[v[2b]:[]]]]

The Tree is just represented as a Map where each key represents a like a root and value is another Tree (i.e. the branches from it). It is perhaps better visualized this way:

gremlin> g.V('1').repeat(out()).emit().tree().unfold()
==>v[1]={v[2b]={}, v[2a]={v[3a]={v[4a]={}}}}
gremlin> g.V('1').repeat(out()).emit().tree().unfold().next().value
==>v[2b]={}
==>v[2a]={v[3a]={v[4a]={}}}

If neither of these structures are suitable and subgraph() is not available you can technically just capture and return the edges you traverse as the low level elements of your subgraph as described in this blog post.

Given the comments on this answer I also present the following option which used group():

gremlin> g.V('1').emit().
......1>   repeat(outE().group('a').by(outV()).by(inV().fold()).inV()).cap('a').unfold()
==>v[1]=[v[2a], v[2b]]
==>v[3a]=[v[4a]]
==>v[2a]=[v[3a]]

It's not exactly a "tree" but if you know the root (in this case v[1]) you can find its key in the Map. The values are the children. You can then look up each of those keys in the Map to find if they have children and so on. For example, we can lookup v[2b] and find that it has no children while looking up [v2a] reveals a single child of [v3a]. Gremlin can be pretty flexible in getting answers if you can be sorta flexible in how you deal with the results.