gremlinamazon-neptuneopencypher

Neptune gremlin graph traversal that uses previous edge property to decide next edge, and the property is a collection


My requirement is to use properties from incoming edge which can be sets of strings to decide which outgoing edges to select for further traversal. If any of the values in one collection can be found in another collection then I want to select the edge for traversal. There can be up to a 1000 values in the set.

Neptune does not allow sets on edge properties.

enter image description here

There is a workaround proposed for openCypher, I am not sure however if such a workaround is available for Gremlin. Using string split() does not appear to be an option in Gremlin unless lambdas are involved, but then the use of lambdas is discouraged in TinkerPop documentation and when I try to check if Neptune supports Gremlin lambdas I find content on AWS lambdas only...


Solution

  • The aforementioned workaround pertains to node properties as lists in openCypher. As of today, Neptune only supports the use of "single" and "set" cardinality for node/vertex properties. The workaround mentions storing values as string-ified lists and using openCypher string functions.

    There is now a similar set of string based functions in Gremlin (as of 3.7.x) and supported in Neptune under engine version 1.3.2.0: https://docs.aws.amazon.com/neptune/latest/userguide/engine-releases-1.3.2.0.html

    You could possibly store the list of values as a string and use the string based functions in Gremlin, similar to how they are used in openCypher:

    g.addV('Test').property('names','Alvin,Simon,Theodore')
    
    g.V().hasLabel('Test').
        where(values('names').split(',').intersect(['Alvin']).unfold())
    

    Just realize that this would not be an indexed lookup. As in the example above, all nodes/vertices with a label of Test would need to be fetched and all string properties of names would have to be unfurled to find if the string contains the value being looked for.

    If you're looking to store data like this at-scale, I would consider remodeling the data structure in one of two ways: