[SOLVED] Gremlin query, get all the nodes 2 - 4 steps from the selected nodes

Gremlin query, get all the nodes 2 - 4 steps from the selected nodes

I would like to retrieve all the nodes that are 2 to 4 steps away from a selected node.

My thought was that I could separate the query into two repeat().times(2) step so that I can call them later:

Since my graph is very connected, I try to query by taking the shortest path

g.V().has('name', 'a').store('x')
.emit().repeat(outE().inV().where(without('x')).aggregate('x')).times(2).as('exclude')
.emit().repeat(outE().inV().where(without('x')).aggregate('x')).times(2).as('keep')
.select('keep')
.values('name')
.dedup()

This does not return the correct result.

In fact, if I change this query to

g.V().has('name', 'a').store('x')
.emit().repeat(outE().inV().where(without('x')).aggregate('x')).times(2).as('exclude')
.emit().repeat(outE().inV().where(without('x')).aggregate('x')).times(2).as('keep')
.select('exclude')
.values('name')
.dedup()

I am only getting the source node back ('a')

what is the correct way to do this? I am using amazon neptune with jupyter graph notebook

Solution

Doing the repeat twice within a union maybe the "brute force" way of doing this. There's likely a better way by traversing through the graph once and then pulling out only the parts that you need:

g.withSack(0).V().has('name','a').
    repeat(
        sack(sum).by(constant(1)).
        out().simplePath()
    ).emit().times(4).
group().
    by(sack()).
unfold().
    where(
        select(keys).is(within([2,4]))
    ).
select(values).unfold().
values('name')

This uses sack() as a way to track depth as you traverse through the graph. You can then group-by the depth and only return the items at specific depths.

https://www.kelvinlawrence.net/book/PracticalGremlin.html#sackintro