[SOLVED] Variable execution time for the same query in Neo4j

Variable execution time for the same query in Neo4j

I am running the same query in Neo4j and noticing that the execution time varies significantly. For example, the query might execute in 5-6 seconds, while at another time it takes 2-3 minutes, even though the data remains unchanged.

:param {
  idsToExclude: []
};

:auto LOAD CSV WITH HEADERS FROM ('file:///PERSON_DATA.csv') AS row
WITH row
WHERE NOT row.`person_id` IN $idsToExclude AND NOT row.`person_id` IS NULL
CALL {
  WITH row
  MERGE (n: `Person` { `person_id`: row.`person_id` })
  SET n.`person_id` = row.`person_id`
  SET n.`name` = row.`name`
  SET n.`age` = toInteger(row.`age`)
  SET n.`email` = row.`email`
  SET n.`address` = row.`address`
  SET n.`creation_date` = datetime(row.`creation_date`)
  SET n.`last_modified_date` = datetime(row.`last_modified_date`)
} IN TRANSACTIONS OF 5000 ROWS;

here is some data from my config file that has been uncommented if needed

server.memory.heap.initial_size=8G
server.memory.heap.max_size=16G
dbms.memory.transaction.total.max=32G

EXPLAIN of my Query

Why can the execution of the same query in Neo4j take different amounts of time? What factors can influence this, and how can I optimize its performance?

Solution

The cause of the slowdown is that you do not have an index on :Person(person_id), that would explain why early runs (with little data) execute fast, but become increasingly expensive as data is loaded, or over a larger CSV to ingest.

Without an index, the cost of the MATCH part of the MERGE (since a MERGE is like a MATCH and then a CREATE, if no matches were found) increases linearly with the number of :Person nodes in the graph. With an index, there is still a rising cost, but it's log(n) complexity due to index use, so should remain efficient.

Add the index, confirm that the index is used in the EXPLAIN plan (you should see a NodeIndexSeek operator and no NodeByLabelScan operators), then test again.