neo4jcypherneo4jclient

Neo4j - Adding Large # of Relationships - Performance


The Setup

I have a .NET application that uses the Neo4j client library to perform CRUD operations on a backend Neo4j database. The nodes and relationships in this database represent formulas & parameters.

There are two node labels: BusinessRule and ChartField.

For each BusinessRule node, there will be a single OUTPUTS relationship to a ChartField node, and there could be 1 to many USES relationships to other ChartField nodes that represent the parameters for the BusinessRule formula.

When a user finishes configuring all their rules, they publish their changes, which will rebuild the graph.

The Problem

I'm struggling with the performance of adding the relationships in the graph.

In a single "publish", I may have 3,000 distinct BusinessRules. Adding all of the BusinessRule and ChartField nodes happens quickly, and performance there is not an issue.

But adding the relationships for each of the 3,000 BusinessRules is taking a very long time.

Below is an example of the Cypher query that would add the relationships for a single BusinessRule. It has to run 3,000 times to complete the task. I do have an index on the Id property for both node types.

MATCH (p1:BusinessRule {Id: '2025-BUDGET-10000184-11061345'})  
MATCH (t1:ChartField {Id: '2025-BUDGET-11061345'})  
MATCH (v1:ChartField {Id: '2025-BUDGET-11061472'})  
MATCH (v2:ChartField {Id: '2025-BUDGET-11062722'})  
CREATE (p1)-[:OUTPUTS {Type: 'OUTPUTS', TargetCFID: 11061345, BusinessRuleId: '2025-BUDGET-10000184-11061345'}]->(t1)  
CREATE (p1)-[:USES {Type: 'USES', ParamCFID: 11061472, BusinessRuleId: '2025-BUDGET-10000184-11061345'}]->(v1)  
CREATE (p1)-[:USES {Type: 'USES', ParamCFID: 11062722, BusinessRuleId: '2025-BUDGET-10000184-11061345'}]->(v2)

This query works by matching the relevant BusinessRule and ChartField nodes where p1 is the BusinessRule, t1 is the target ChartField node and v* are the ChartField parameters. Then with those, we can add the relationships.

Does anyone have any suggestions on how to speed up this process? It took almost 22 minutes to execute this cypher (3,000) times.

I've considered batching up to 100 of these cypher queries together to save on the back/forth, but that gets a bit complex, as the alias names have to all be unique.

In some batching tests, I saw some improvement, but nothing significant.


Solution

  • Making thousands of separate queries is definitely not good practice. Not only does it involve a lot of unnecessary networking/transaction overhead, but the neo4j server will have to parse and generate a new plan for every query.

    You should be able to easily and efficiently create the relationships using using a single query.

    First, create a 3000-element list of maps, where each map has the format:

    {p1: '2025-BUDGET-10000184-11061345', 
     t1: '2025-BUDGET-11061345', t1CFID: '...', t1Br: '...',
     v1: '...', v1CFID: '...', v1Br: '...',
     v2: '...', v2CFID: '...', v2Br: '...'}
    

    Then, pass that list as a $data parameter to this query:

    UNWIND $data AS d
    MATCH (p1:BusinessRule {Id: d.p1})  
    MATCH (t1:ChartField {Id: d.t1})  
    MATCH (v1:ChartField {Id: d.v1})  
    MATCH (v2:ChartField {Id: d.v2})  
    CREATE (p1)-[:OUTPUTS {Type: 'OUTPUTS', TargetCFID: d.t1CFID, BusinessRuleId: d.t1Br}]->(t1)  
    CREATE (p1)-[:USES {Type: 'USES', ParamCFID: d.v1CFID, BusinessRuleId: d.v1Br}]->(v1)  
    CREATE (p1)-[:USES {Type: 'USES', ParamCFID: d.v2CFID, BusinessRuleId: d.v2Br}]->(v2)
    

    Since your $data would only have 3000 elements, there should be no issue with the transaction running out of memory. But if you had a very large amount of data, then you should consider breaking it down into manageable chunks.