We are using the Gremlin JavaScript language variant and Amazon Neptune in our project and we have multiple use cases for the creation of vertices and edges in batch.
A simple example would be an array of 200 - 1000 users. I need to perform a batch query that checks whether the user exists or not. If the user exists then add the vertices with the properties else ignore that user. All these conditions need to be done in batch.
Note: Usage of Gremlin scripts needs to be avoided. So traversal is what I am looking for.
It is possible to seed a query with a list of maps containing the data to be inserted. You can further extend the pattern to use a coalesce
step to do conditional inserts. Using the air-routes data set here is a simple example that creates a new XYZ
airport and figures out the other airports already exist. Note that the mid-traversal V
step makes this a somewhat expensive query as for each map in the list all vertices have to be "searched".
g.inject([['code':'AUS'],['code':'XYZ'],['code':'SFO']]).
unfold().as('data').
coalesce(V().hasLabel('airport').
where(eq('data')).
by('code').
by(select('code')),
addV('airport').
property('code',select('code')))
There are additional discussions of using this pattern to avoid long chains of addV
and addE
steps in a query.
https://tinkerpop.apache.org/docs/current/recipes/#long-traversals
When the query is run you can see that a new ID is created for the XYZ airport and the existing IDs are found for the others.
gremlin> g.inject([['code':'AUS'],['code':'XYZ'],['code':'SFO']]).
......1> unfold().as('data').
......2> coalesce(V().hasLabel('airport').
......3> where(eq('data')).
......4> by('code').
......5> by(select('code')),
......6> addV('airport').
......7> property('code',select('code')))
==>v[3]
==>v[61286]
==>v[23]