Why is JanusGraph not finding-and-using the index I made at time of search?
Please note, I need this to work with the Java implementation specifically; not Gremlin Console.
And this needs to work with the latest JanusGraph releases; not older versions,
Especially not with different major-version release-numbers.
[WARN] [o.j.g.t.StandardJanusGraphTx.main] :: Query requires iterating over all vertices [[]]. For better performance, use indexes
janusGraphManagement.buildIndex("_id", Vertex.class).addKey(propertyKey).buildCompositeIndex();
So I expect for class Vertex
(which is all of vertexes) to be using the index.
But what I actually see is the warning that this is not so; and
I experience is data returned and changed to be slower than expected.
2023-05-09 15:18:38,678 [INFO] [c.d.o.d.i.c.ContactPoints.main] :: Contact point localhost:9042 resolves to multiple addresses, will use them all ([localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1])
2023-05-09 15:18:38,759 [INFO] [c.d.o.d.i.c.DefaultMavenCoordinates.main] :: DataStax Java driver for Apache Cassandra(R) (com.datastax.oss:java-driver-core) version 4.15.0
2023-05-09 15:18:39,215 [INFO] [c.d.o.d.i.c.t.Clock.JanusGraph Session-admin-0] :: Using native clock for microsecond precision
2023-05-09 15:18:39,468 [WARN] [c.d.o.d.i.c.l.h.OptionalLocalDcHelper.JanusGraph Session-admin-0] :: [JanusGraph Session|default] You specified datacenter1 as the local DC, but some contact points are from a different DC: Node(endPoint=localhost/[0:0:0:0:0:0:0:1]:9042, hostId=null, hashCode=60cfcdc3)=null; please provide the correct local DC, or check your contact points
2023-05-09 15:18:39,691 [INFO] [o.j.g.i.UniqueInstanceIdRetriever.main] :: Generated unique-instance-id=c0a8563c21376-rmt-lap-win201
2023-05-09 15:18:39,707 [INFO] [c.d.o.d.i.c.ContactPoints.main] :: Contact point localhost:9042 resolves to multiple addresses, will use them all ([localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1])
2023-05-09 15:18:39,732 [INFO] [c.d.o.d.i.c.t.Clock.JanusGraph Session-admin-0] :: Using native clock for microsecond precision
2023-05-09 15:18:39,766 [WARN] [c.d.o.d.i.c.l.h.OptionalLocalDcHelper.JanusGraph Session-admin-0] :: [JanusGraph Session|default] You specified datacenter1 as the local DC, but some contact points are from a different DC: Node(endPoint=localhost/127.0.0.1:9042, hostId=null, hashCode=3d0c7306)=null; please provide the correct local DC, or check your contact points
2023-05-09 15:18:39,786 [INFO] [o.j.d.c.ExecutorServiceBuilder.main] :: Initiated fixed thread pool of size 40
2023-05-09 15:18:39,897 [INFO] [o.j.g.d.StandardJanusGraph.main] :: Gremlin script evaluation is disabled
2023-05-09 15:18:39,919 [INFO] [o.j.d.l.k.KCVSLog.main] :: Loaded unidentified ReadMarker start time 2023-05-09T20:18:39.919098Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@f1d0004
2023-05-09 15:18:39,971 [WARN] [o.j.g.t.StandardJanusGraphTx.main] :: Query requires iterating over all vertices [[]]. For better performance, use indexes
2023-05-09 15:18:41,311 [INFO] [Main.main] :: g.V().count().next(): 69999
2023-05-09 15:18:41,314 [WARN] [o.j.g.t.StandardJanusGraphTx.main] :: Query requires iterating over all vertices [[]]. For better performance, use indexes
The index seems to exist, because trying to create will cause IllegalArgumentException: An index with name '_id' has already been defined
.
I don't delete the index in-between runs, so there should be more than enough time for JanusGraph to create the index, especially if I let the container run overnight.
...
2023-05-09 15:21:38,946 [INFO] [o.j.d.c.ExecutorServiceBuilder.main] :: Initiated fixed thread pool of size 40
2023-05-09 15:21:39,053 [INFO] [o.j.g.d.StandardJanusGraph.main] :: Gremlin script evaluation is disabled
2023-05-09 15:21:39,079 [INFO] [o.j.d.l.k.KCVSLog.main] :: Loaded unidentified ReadMarker start time 2023-05-09T20:21:39.079040Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@f1d0004
Exception in thread "main" java.lang.IllegalArgumentException: An index with name '_id' has already been defined
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:220)
at org.janusgraph.graphdb.database.management.ManagementSystem.checkIndexName(ManagementSystem.java:661)
at org.janusgraph.graphdb.database.management.ManagementSystem.createCompositeIndex(ManagementSystem.java:728)
at org.janusgraph.graphdb.database.management.ManagementSystem.access$300(ManagementSystem.java:130)
at org.janusgraph.graphdb.database.management.ManagementSystem$IndexBuilder.buildCompositeIndex(ManagementSystem.java:824)
at Test3.main(Test3.java:24)
I've done this before with OrientDB and thought I understood how to work with Tinkerpop3 + Gremlin-related graph-databases then. But as I'm transitioning to JanusGraph from OrientDB they don't seem to have the same process of using indexes after they've been made.
OrientGraph orientGraph = OrientGraph.open(configuration);
if (!orientGraph.getVertexIndexedKeys("V").contains("_id"))
orientGraph.createVertexIndex("_id", "V", configuration);
orientGraph.commit();
JanusGraph never seems to make the index requested; same thing as before but now:
...
2023-05-11 07:49:16,383 [INFO] [o.j.d.l.k.KCVSLog.main] :: Loaded unidentified ReadMarker start time 2023-05-11T12:49:16.382971Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@6ad16c5d
2023-05-11 07:49:16,512 [INFO] [o.j.g.d.m.GraphIndexStatusWatcher.main] :: Some key(s) on index _id do not currently have status(es) [REGISTERED]: _id=ENABLED
...
2023-05-11 07:50:16,913 [INFO] [o.j.g.d.m.GraphIndexStatusWatcher.main] :: Timed out (PT1M) while waiting for index _id to converge on status(es) [REGISTERED]
2023-05-11 07:50:16,998 [WARN] [o.j.g.t.StandardJanusGraphTx.main] :: Query requires iterating over all vertices [[]]. For better performance, use indexes
2023-05-11 07:50:17,020 [INFO] [Main.main] :: g.V().count().next(): 0
2023-05-11 07:50:17,025 [WARN] [o.j.g.t.StandardJanusGraphTx.main] :: Query requires iterating over all vertices [[]]. For better performance, use indexes
2023-05-11 07:50:17,032 [WARN] [o.j.g.t.StandardJanusGraphTx.main] :: Query requires iterating over all vertices [[]]. For better performance, use indexes
2023-05-11 07:50:17,038 [INFO] [Main.main] :: g.V().count().next(): 0
Process finished with exit code 0
Awaiting now for JanusGraph to create index by casting JanusGraph.openManagement()
from JanusGraphManagement
to ManagementSystem
.
I've already tried variations of janusGraphManagement.rollback()
even though this is a brand-new container with no data. Here's the log output to show what messages I get:
...
2023-05-11 08:16:16,302 [INFO] [o.j.d.l.k.KCVSLog.main] :: Loaded unidentified ReadMarker start time 2023-05-11T13:16:16.302033Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@f1d0004
Exception in thread "main" java.lang.IllegalStateException: Operation cannot be executed because the enclosing transaction is closed
at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.verifyOpen(StandardJanusGraphTx.java:369)
at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.getRelationType(StandardJanusGraphTx.java:1080)
at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.getOrCreatePropertyKey(StandardJanusGraphTx.java:1142)
at org.janusgraph.graphdb.database.management.ManagementSystem.getOrCreatePropertyKey(ManagementSystem.java:1478)
at Test7.main(Test7.java:20)
...
2023-05-11 08:18:50,029 [INFO] [o.j.d.l.k.KCVSLog.main] :: Loaded unidentified ReadMarker start time 2023-05-11T13:18:50.029694Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@f1d0004
Exception in thread "main" java.lang.IllegalStateException: This management system instance has been closed
at com.google.common.base.Preconditions.checkState(Preconditions.java:502)
at org.janusgraph.graphdb.database.management.ManagementSystem.ensureOpen(ManagementSystem.java:228)
at org.janusgraph.graphdb.database.management.ManagementSystem.commit(ManagementSystem.java:233)
at Test7.main(Test7.java:17)
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversalSource;
import org.apache.tinkerpop.gremlin.structure.Vertex;
import org.janusgraph.core.JanusGraph;
import org.janusgraph.core.JanusGraphFactory;
import org.janusgraph.core.PropertyKey;
import org.janusgraph.graphdb.database.management.ManagementSystem;
public class Test7 {
private static final Logger logger = LogManager.getLogger(Main.class);
public static void main(String[] args) throws InterruptedException {
JanusGraph janusGraph = JanusGraphFactory.build().set("storage.backend", "cql").set("storage.hostname", "localhost:19042").open();
ManagementSystem janusGraphManagement = (ManagementSystem) janusGraph.openManagement();
janusGraphManagement.rollback();
janusGraphManagement = (ManagementSystem) janusGraph.openManagement();
PropertyKey propertyKey = janusGraphManagement.getOrCreatePropertyKey("_id");
if (!janusGraphManagement.containsGraphIndex("_id"))
janusGraphManagement.buildIndex("_id", Vertex.class).addKey(propertyKey).buildCompositeIndex();
janusGraphManagement.commit();
ManagementSystem.awaitGraphIndexStatus(janusGraph, "_id").call();
// janusGraphManagement.updateIndex(janusGraphManagement.getGraphIndex("_id"), SchemaAction.REINDEX);
GraphTraversalSource g = janusGraph.traversal();
logger.info("g.V().count().next():\t" + g.V().count().next());
g.V().drop().iterate();
logger.info("g.V().count().next():\t" + g.V().count().next());
janusGraph.close();
}
}
Short answer: JanusGraph wants it SchemaStatus.ENABLED.
Which I can get ENABLED
, but the issue still persists for Cassandra.
[INFO] [o.j.g.d.m.GraphIndexStatusWatcher.main] :: Some key(s) on index _id do not currently have status(es) [REGISTERED]: _id=ENABLED
...
2023-05-15 09:58:05,984 [INFO] [o.j.d.l.k.KCVSLog.main] :: Loaded unidentified ReadMarker start time 2023-05-15T14:58:05.984686Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@48b4a043
2023-05-15 09:58:06,047 [WARN] [o.j.g.t.StandardJanusGraphTx.main] :: Query requires iterating over all vertices [[]]. For better performance, use indexes
2023-05-15 09:58:06,270 [WARN] [o.j.g.t.StandardJanusGraphTx.main] :: Query requires iterating over all vertices [[~label = entity]]. For better performance, use indexes
2023-05-15 09:58:06,275 [INFO] [Main.main] :: drop g.V().hasLabel("entity").count().next(): 0
2023-05-15 09:58:07,020 [WARN] [o.j.g.t.StandardJanusGraphTx.main] :: Query requires iterating over all vertices [[~label = entity]]. For better performance, use indexes
2023-05-15 09:58:07,033 [INFO] [Main.main] :: addVertex g.V().hasLabel("entity").count().next(): 1
2023-05-15 09:58:07,046 [INFO] [o.j.g.d.m.GraphIndexStatusWatcher.main] :: Some key(s) on index _id do not currently have status(es) [REGISTERED]: _id=ENABLED
...
[o.j.g.d.m.GraphIndexStatusWatcher.main] :: Some key(s) on index _id do not currently have status(es) [REGISTERED]: _id=ENABLED
2023-05-15 09:59:07,520 [INFO] [o.j.g.d.m.GraphIndexStatusWatcher.main] :: Timed out (PT1M) while waiting for index _id to converge on status(es) [REGISTERED]
2023-05-15 09:59:07,522 [WARN] [o.j.g.t.StandardJanusGraphTx.main] :: Query requires iterating over all vertices [[~label = entity]]. For better performance, use indexes
2023-05-15 09:59:07,531 [INFO] [Main.main] :: awaitGraphIndexStatus g.V().hasLabel("entity").count().next(): 1
2023-05-15 09:59:07,627 [INFO] [o.j.g.o.j.IndexRepairJob.Thread-68] :: Index _id metrics: success-tx: 2 doc-updates: 0 succeeded: 1
...
2023-05-15 09:59:08,297 [INFO] [o.j.g.d.m.ManagementSystem.Thread-51] :: Index update job successful for [_id]
2023-05-15 09:59:08,299 [WARN] [o.j.g.t.StandardJanusGraphTx.main] :: Query requires iterating over all vertices [[~label = entity]]. For better performance, use indexes
2023-05-15 09:59:08,304 [INFO] [Main.main] :: updateIndex g.V().hasLabel("entity").count().next(): 1
Process finished with exit code 0
public class Test {
private static final Logger logger = LogManager.getLogger(Main.class);
public static void main(String[] args) throws InterruptedException, ExecutionException {
JanusGraph janusGraph = JanusGraphFactory.build().set("storage.backend", "cql").set("storage.hostname", "localhost:9042").open();
GraphTraversalSource g = janusGraph.traversal();
g.V().drop().iterate();
janusGraph.tx().commit();
janusGraph.tx().open();
logger.info("drop g.V().hasLabel(\"entity\").count().next():\t" + g.V().hasLabel("entity").count().next());
JanusGraphVertex vertex = janusGraph.addVertex("entity");
vertex.property("_id", "Test1");
janusGraph.tx().commit();
janusGraph.tx().open();
logger.info("addVertex g.V().hasLabel(\"entity\").count().next():\t" + g.V().hasLabel("entity").count().next());
JanusGraphManagement janusGraphManagement = janusGraph.openManagement();
PropertyKey propertyKey = janusGraphManagement.getOrCreatePropertyKey("_id");
if (!janusGraphManagement.containsGraphIndex("_id"))
janusGraphManagement.buildIndex("_id", Vertex.class).addKey(propertyKey).buildCompositeIndex();
janusGraphManagement.commit();
janusGraphManagement = janusGraph.openManagement();
GraphIndexStatusReport report = ManagementSystem.awaitGraphIndexStatus(janusGraph, "_id").call();
logger.info("awaitGraphIndexStatus g.V().hasLabel(\"entity\").count().next():\t" + g.V().hasLabel("entity").count().next());
JanusGraphIndex test = janusGraphManagement.getGraphIndex("_id");
janusGraphManagement.updateIndex(test, SchemaAction.REINDEX).get();
janusGraphManagement.commit();
logger.info("updateIndex g.V().hasLabel(\"entity\").count().next():\t" + g.V().hasLabel("entity").count().next());
janusGraph.close();
}
}