neo4j

Variable execution time for the same query in Neo4j


I am running the same query in Neo4j and noticing that the execution time varies significantly. For example, the query might execute in 5-6 seconds, while at another time it takes 2-3 minutes, even though the data remains unchanged.

:param {
  idsToExclude: []
};

:auto LOAD CSV WITH HEADERS FROM ('file:///PERSON_DATA.csv') AS row
WITH row
WHERE NOT row.`person_id` IN $idsToExclude AND NOT row.`person_id` IS NULL
CALL {
  WITH row
  MERGE (n: `Person` { `person_id`: row.`person_id` })
  SET n.`person_id` = row.`person_id`
  SET n.`name` = row.`name`
  SET n.`age` = toInteger(row.`age`)
  SET n.`email` = row.`email`
  SET n.`address` = row.`address`
  SET n.`creation_date` = datetime(row.`creation_date`)
  SET n.`last_modified_date` = datetime(row.`last_modified_date`)
} IN TRANSACTIONS OF 5000 ROWS;

here is some data from my config file that has been uncommented if needed

server.memory.heap.initial_size=8G
server.memory.heap.max_size=16G
dbms.memory.transaction.total.max=32G

EXPLAIN of my Query

First part Second part Second part

Why can the execution of the same query in Neo4j take different amounts of time? What factors can influence this, and how can I optimize its performance?


Solution

  • The cause of the slowdown is that you do not have an index on :Person(person_id), that would explain why early runs (with little data) execute fast, but become increasingly expensive as data is loaded, or over a larger CSV to ingest.

    Without an index, the cost of the MATCH part of the MERGE (since a MERGE is like a MATCH and then a CREATE, if no matches were found) increases linearly with the number of :Person nodes in the graph. With an index, there is still a rising cost, but it's log(n) complexity due to index use, so should remain efficient.

    Add the index, confirm that the index is used in the EXPLAIN plan (you should see a NodeIndexSeek operator and no NodeByLabelScan operators), then test again.