gremlingraph-databasesamazon-neptune

Gremlin query, get all the nodes 2 - 4 steps from the selected nodes


I would like to retrieve all the nodes that are 2 to 4 steps away from a selected node.

My thought was that I could separate the query into two repeat().times(2) step so that I can call them later:

Since my graph is very connected, I try to query by taking the shortest path

g.V().has('name', 'a').store('x')
.emit().repeat(outE().inV().where(without('x')).aggregate('x')).times(2).as('exclude')
.emit().repeat(outE().inV().where(without('x')).aggregate('x')).times(2).as('keep')
.select('keep')
.values('name')
.dedup()

This does not return the correct result.

In fact, if I change this query to

g.V().has('name', 'a').store('x')
.emit().repeat(outE().inV().where(without('x')).aggregate('x')).times(2).as('exclude')
.emit().repeat(outE().inV().where(without('x')).aggregate('x')).times(2).as('keep')
.select('exclude')
.values('name')
.dedup()

I am only getting the source node back ('a')

what is the correct way to do this? I am using amazon neptune with jupyter graph notebook


Solution

  • Doing the repeat twice within a union maybe the "brute force" way of doing this. There's likely a better way by traversing through the graph once and then pulling out only the parts that you need:

    g.withSack(0).V().has('name','a').
        repeat(
            sack(sum).by(constant(1)).
            out().simplePath()
        ).emit().times(4).
    group().
        by(sack()).
    unfold().
        where(
            select(keys).is(within([2,4]))
        ).
    select(values).unfold().
    values('name')
    

    This uses sack() as a way to track depth as you traverse through the graph. You can then group-by the depth and only return the items at specific depths.

    https://www.kelvinlawrence.net/book/PracticalGremlin.html#sackintro