I'm using Titan with Cassandra and have several (related) questions about querying the database with Gremlin:
1.) Is there an faster way to count all vertices than
g.V.count()
Titan claims to use an index. But how can I use an index without property?
WARN c.t.t.g.transaction.StandardTitanTx - Query requires iterating over all vertices [<>]. For better performance, use indexes
2.) Is there an faster way to count all vertices with property 'myProperty' than
g.V.has('myProperty').count()
Again Titan means following:
WARN c.t.t.g.transaction.StandardTitanTx - Query requires iterating over all vertices [(myProperty<> null)]. For better performance, use indexes
But again, how can I do this? I already have an index of 'myProperty', but it needs a value to query fast.
3.) And the same questions with edges...
Iterating all vertices with g.V.count()
is the only way to get the count. It can't be done "faster". If your graph is so large that it takes hours to get an answer or your query just never returns at all, you should consider using Faunus. However, even with Faunus you can expect to wait for your answer (such is the nature of Hadoop...no sub-second response here), but at least you will get one.
Any time you do a table scan (i.e. iterate all Vertices) you get that warning of "iterating over all vertices". Generally speaking, you don't want to do that, as you will never get a response. Adding an index won't help you count all vertices any faster.
Edges have the same answer. Use g.E.count()
in Gremlin if you can. If it takes too long, then try Faunus so you can at least get an answer.