janusgraph

Janusgraph Gives Timeout for most of the queries


I am using Janusgraph 0.6.3 with cassandra as backend and elastic search as the index backend. I have around 30 million vertices and 40 million edges in my graph, I am using a 128 GB RAM machine, with 32 core CPU which have both Janus and Cassandra installed on it. (Elastic is hosted on other machine with more better specs).

I have defined both composite and mixed index for the property I am using in my gremlin queries,

Composite Index

management.buildIndex("seeker", Vertex.class).addKey(management.getPropertyKey("p_uuid"))
                .indexOnly(management.getVertexLabel("seeker"))
                .buildCompositeIndex();
        management.buildIndex("company", Vertex.class).addKey(management.getPropertyKey("company_name"))
                .indexOnly(management.getVertexLabel("company"))
                .buildCompositeIndex();
        management.buildIndex("start_date", Edge.class).addKey(management.getPropertyKey("start_date"))
                .indexOnly(management.getEdgeLabel("worked"))
                .buildCompositeIndex();
        management.buildIndex("end_date", Edge.class).addKey(management.getPropertyKey("end_date"))
                .indexOnly(management.getEdgeLabel("worked"))
                .buildCompositeIndex();

Mixed Index

management.buildIndex("vertex", Vertex.class).addKey(management.getPropertyKey("p_uuid"))
                .addKey(management.getPropertyKey("company_name"))
                .buildMixedIndex(this.graphCreds.getIndex_index_name());
        management.buildIndex("edge", Edge.class).addKey(management.getPropertyKey("start_date"))
                .addKey(management.getPropertyKey("end_date"))
                .addKey(management.getPropertyKey("current"))
                .buildMixedIndex(graphCreds.getIndex_index_name());

The query that I am running on this graph is,

g.V().hasLabel("seeker").has("p_uuid", <p_uuid>).outE("worked").as_("e1").inV().inE("worked").as_("e2")\
                .outV().as_("b").outE("worked").where(inV().hasLabel("company").has("company_name", <company_name>)).has("current", 1)\
                .where("e1", lt("e2")).by("start_date").by("end_date").where("e1", gt("e2")).by("end_date").by("start_date")\
                .select("e1", "e2", "b").range_(0,10).toList()

As we can say, I am using 4 properties in my gremlin query,

and all 4 are indexed in both Composite and Mixed Index. Then why does it gives timeout.

I have checked the machine performance metrics as well, the machine does not utilize 50% of CPU. If the machine has CPU power, then why Janusgraph is not uitlizing that ?

I have kept all the Janusgraph configurations to default value.

One thing that I am also unsure about is, when I do the profiling for the above query, it hits the seeker index only (from the first part of the query), after that shouldn't it hit the company index as well ? and start_date and end_date index as well ?


Solution

  • it hits the seeker index only (from the first part of the query), after that shouldn't it hit the company index as well ? and start_date and end_date index as well ?

    Global indexes are (almost) only used at the first step of your query. If you want to speed up edge traversal, you'd want to use Vertex Centric Indexes.

    References:

    1. https://li-boxuan.medium.com/janusgraph-deep-dive-part-2-demystify-indexing-d26e71edb386
    2. https://li-boxuan.medium.com/janusgraph-deep-dive-part-3-speed-up-edge-queries-3b9eb5ba34f8
    3. https://docs.janusgraph.org/schema/index-management/index-performance/#vertex-centric-indexes