[SOLVED] The most efficient way to compare 2 aggregates (set of vertices) in Gremlin Query Language

The most efficient way to compare 2 aggregates (set of vertices) in Gremlin Query Language

What's the most efficient way to compare 2 aggregates (set of vertices) and see if they contain the exact same results?

Using this doesn't seem to work:

.where(select('aggregate1').is(eq('aggregate2')))

aggregate1 and aggregate2 each one contains a bunch of vertices.

Solution

Evaluating equality of two sets should work, but they will both likely need to be ordered sets. Examples

g.inject([1,2,3]).is(eq([1,2,3]))

... provides a match and returns ...

1   [1, 2, 3]

However, if I change the order of the second set, as in...

g.inject([3,2,1]).is(eq([1,2,3]))

... then nothing gets returned (showing non-equivalence).

You would need to insert an order(local) into the query to ensure the first set is ordered to match the second:

g.inject([3,2,1]).order(local).is(eq([1,2,3]))

Returns:

1   [1, 2, 3]

As an aside...

If looking for the difference between the two, you could use where(within('x')) or where(without('x')) as a way to see the intersection or complement respectively.

As an example using the airroutes dataset, if I wanted to find the common destination airports flying from MLI and ABE, I could use the following query:

g.V().hasLabel('airport').has('code','MLI').
    out('route').
    aggregate('mlir').limit(1).
    V().hasLabel('airport').has('code','ABE').
    out('route').
    aggregate('aber').
    where(within('mlir')).values('code')

Returns result:

1   PIE
2   ATL
3   ORD
4   SFB
5   DTW

If I wanted to see which destination airports are in one set but not the other, I could use:

g.V().hasLabel('airport').has('code','MLI').
    out('route').
    aggregate('mlir').limit(1).
    V().hasLabel('airport').has('code','ABE').
    out('route').
    aggregate('aber').
    where(without('mlir')).values('code')

Result:

1   CLT
2   MYR
3   BNA
4   FLL
5   PGD
6   PHL