graph gremlin tinkerpop tinkerpop-blueprint

How do I write Gremlin queries for following patterns?

I have some miniature Gremlin directed graphs, in which each vertex has two properties "type" and "text". The values for "text" property are just English text while the "type" property can a have a value selected from this set :

NP, PP, VP, ADVP, ADJP, SBAR, PRT, INTJ, O

All edges in those graphs have same label : "next".

I want to be able to select the graphs which have following patterns of nodes:

1) [text=","] --> type="VP" --> type="ADVP" --> type="NP"
2) type="NP" --> [text="," Upto 3 nodes with any text and type text=","] --> type="VP" --> [text=":" OR "that"]

The pattern element in brackets means that it is optional.

So, for first pattern, I need to select the graphs which have a node with text "," optionally, followed by a node with type "VP", followed by "ADVP", followed "NP".

For second pattern, I need to select the graphs which have a node type "NP", followed by an optional sequence of nodes starting with a node with text "," then upto 3 nodes with any text and type and then a node with text ",". This optional sequence is then followed by a node of type "VP" and finally a node with text ":" or "that".

Two sample graphs that would match with first pattern are :

Following are sample graphs that would match with second pattern:

I understand basic Gremlin traversals but I am not sure about how to handle optional elements of the pattern above.

Is there any way to write queries for such patterns in Gremlin? If not, can you suggest a non-Gremlin based approach to create such graphs and querying them ?

Solution

You can do pattern matching in Gremlin as of TinkerPop 3.0. You would use Match Step to accomplish your task. I've written the Gremlin to do so as an example for your first requirement. Perhaps that will inspire you to develop the traversal for your second requirement.

I generated some data as follows:

gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> v1=g.addV(id, 1, "type", "o", "text", ",").next()
==>v[1]
gremlin> v2=g.withSideEffect('x',v1).addV(id, 2, "type", "vp", "text", "a").addInE('next','x').inV().next()
==>v[2]
gremlin> v3=g.withSideEffect('x',v2).addV(id, 3, "type", "advp", "text", "b").addInE('next','x').inV().next()
==>v[3]
gremlin> g.withSideEffect('x',v3).addV(id, 4, "type", "np", "text", "c").addInE('next','x').inV().next()
==>v[4]
gremlin> 
gremlin> v5=g.addV(id, 5, "type", "vp", "text", "a").next()
==>v[5]
gremlin> v6=g.withSideEffect('x',v5).addV(id, 6, "type", "advp", "text", "b").addInE('next','x').inV().next()
==>v[6]
gremlin> g.withSideEffect('x',v6).addV(id, 7, "type", "np", "text", "c").addInE('next','x').inV().next()
==>v[7]
gremlin> 
gremlin> v8=g.addV(id, 8, "type", "vp", "text", "a").next()
==>v[8]
gremlin> v9=g.withSideEffect('x',v8).addV(id, 9, "type", "o", "text", ",").addInE('next','x').inV().next()
==>v[9]
gremlin> g.withSideEffect('x',v9).addV(id, 10, "type", "np", "text", "c").addInE('next','x').inV().next()
==>v[10]

Then for the matching traversal:

gremlin> g.V().has('type','vp').match(__.as('vp').coalesce(__().in().has('text',','),constant("optional")).as('o'),
gremlin>                              __.as('vp').out().has('type','advp').as('advp'),
gremlin>                              __.as('advp').out().has('type','np').as('np')).select('o','vp','advp','np')
==>[o:v[1], vp:v[2], advp:v[3], np:v[4]]
==>[o:optional, vp:v[5], advp:v[6], np:v[7]]