I'm trying to write a Gremlin query to find a list of traversed vertices and edges (with their properties), returning the most complex (i.e. highest count) of a vertex based on the starting vertex.
In other words, I want to retrieve the patients with the most codes, but there is not a direct relationship between Patients and Codes. This is the relationship and direction: Patient->Diagnosis<-Code
Here is my attempt:
g.V().hasLabel('Patient').
outE().inV().
inE().outV().
path().
by(elementMap()).
order().
by(count(local), asc).
tail(2).
unfold().
toList()
I wanted this to return patient vertices with their traversed edges/vertices, only the top 2 based on the count of codes returned per patient. This is what I got:
single patient vertex with traversed edges/nodes
Here is sample insert to replicate the same relationships:
g
.addV('pat').property(id, 'p-0')
.addV('pat').property(id, 'p-1')
.addV('pat').property(id, 'p-2')
.addV('diag').property(id, 'd-0')
.addV('diag').property(id, 'd-1')
.addV('diag').property(id, 'd-2')
.addV('code').property(id, 'c-0')
.addV('code').property(id, 'c-1')
.V('p-0').addE('contracted').to(V('d-0'))
.V('p-0').addE('contracted').to(V('d-1'))
.V('p-0').addE('contracted').to(V('d-2'))
.V('p-1').addE('contracted').to(V('d-1'))
.V('p-2').addE('contracted').to(V('d-2'))
.V('c-0').addE('includes').to(V('d-0'))
.V('c-1').addE('includes').to(V('d-0'))
.V('c-1').addE('includes').to(V('d-1'))
.V('c-2').addE('includes').to(V('d-1'))
This is an example of the format I would like to return: I used ".path().by(elementMap()).unfold().toList()" after the vertex and edge steps to get this.
I want the output to be the vertices and edges that will produce a graph like this:
As you can see, out of three patients, I want to return the top 2 most complex patients (based on the number of codes their diagnoses have). I don't want to return the patient with just one code.
Thanks for providing the sample graph. That really helps. Using this query helps in just seeing the graph visually.
g.V().hasLabel('pat').
outE().inV().
inE().outV().
simplePath().
path().by(elementMap())
Which, using graph-notebook, produces:
To find the number of codes for each starting patient, we might do this. It builds on the prior query but filters using edge labels.
g.V().hasLabel('pat').as('p').
out('contracted').
group().
by(select('p').id()).
by(in('includes').count())
which will give us the codes associated with each patient
{'p-0': 3, 'p-2': 0, 'p-1': 1}
However, you may not want this double counting where the code is shared by more than one diagnosis. In that case we can dedup
the results.
g.V().hasLabel('pat').as('p').
out('contracted').
group().
by(select('p').id()).
by(in('includes').dedup().count())
which reduces the count for p-0 to 2 and removes p-2 completely as there are no codes.
{'p-0': 2, 'p-1': 1}
UPDATED
Based on additional discussion in comments, this query can use the groupCount
results as a filter.
g.V().hasLabel('pat').as('p').
outE('contracted').inV().
where(
group().
by(select('p').id()).
by(in('includes').dedup().count()).
select(values).unfold().is(2)).
inE().outV().
path().by(elementMap())
When rendered visually