neo4jcypherquery-optimization

The Neo4j query with one relationship is generating a lot of DB hits


I have the following query:

PROFILE
MATCH (c1:Criterion)-[rc1t:CONTAINS ]->(c1t:Translation)
WHERE c1.id IN [5890, 1666, 1283, 4548, 25157, 966, 9320, 2344, 8202, 1707, 461, 7279, 14897, 4147, 4598, 2966, 17784, 77304, 762, 10490, 6012, 3230]
RETURN count(rc1t)

which returns only 5 relationships.

Cypher version: 5, planner: COST, runtime: SLOTTED. 172821 total db hits in 45 ms.

This is the plan:

enter image description here

I am surprised by how many DB hits the database makes for this relationship: (c1:Criterion)-[rc1t:CONTAINS ]->(c1t:Translation)

But I have

MATCH()-[rc1t:CONTAINS]-() RETURN count(rc1t)

3295092 relatioships...

Is there anything that can be done to improve the performance of this query?


Solution

  • Your query found 22 Criterion nodes, and they have a total of 86,352 outgoing CONTAINS relationships. But only 5 of the nodes on the other end of those relationships have the Translation label.

    Approach 1

    To make this use case more performant, you could create CONTAINS_TRANSLATION relationships between appropriate Criterion/Translation pairs, and then change your query to the following. You can keep the existing CONTAINS relationships between the same node pairs if that helps with your other uses cases.

    Note that c1t no longer needs to specify a label, which further speeds up the query.

    MATCH (c1:Criterion)-[rc1t:CONTAINS_TRANSLATION]->(c1t)
    WHERE c1.id IN [5890, 1666, 1283, 4548, 25157, 966, 9320, 2344, 8202, 1707, 461, 7279, 14897, 4147, 4598, 2966, 17784, 77304, 762, 10490, 6012, 3230]
    RETURN count(rc1t)
    

    Approach 2

    If you want to avoid adding a new relationship type (and multiple relationships between some node pairs), you could try this instead:

    1. Create a relationship index for a new dest property on the CONTAINS relationship. (Not all CONTAINS relationships will need to have that property.)

      CREATE INDEX CONTAINS_dest FOR ()-[c:CONTAINS]-() ON (c.dest)
      
    2. Add a dest: 'Translation' property to the CONTAINS relationships that end in a Translation node.

    3. Hint in your query that you want the Cypher planner to generate a plan that uses the new relationship index instead of the index you already have on the Criterion nodes:

      MATCH (c1:Criterion)-[rc1t:CONTAINS]->(c1t)
      USING INDEX rc1t:CONTAINS(dest)
      WHERE
        rc1t.dest = 'Translation' AND
        c1.id IN [5890, 1666, 1283, 4548, 25157, 966, 9320, 2344, 8202, 1707, 461, 7279, 14897, 4147, 4598, 2966, 17784, 77304, 762, 10490, 6012, 3230]
      RETURN count(rc1t)
      

      Note, however, that this approach will only be faster if there are fewer than 86,352 CONTAINS relationships with the desired dest value. Also, if all CONTAINS relationships start from a Criterion node, you can speed up the query further by not bothering to filter by that label (i.e., use MATCH (c1)-[rc1t:CONTAINS_TRANSLATION]->(c1t)).