jsonneo4jneo4j-apoc

Importing json in neo4J


[PROBLEM - My final solution below]

I'd like to import a json file containing my data into Neo4J. However, it is super slow.

The Json file is structured as follow

{
    "graph": {
        "nodes": [
            { "id": 3510982, "labels": ["XXX"], "properties": { ... } },
            { "id": 3510983, "labels": ["XYY"], "properties": { ... } },
            { "id": 3510984, "labels": ["XZZ"], "properties": { ... } },
     ...
        ],
        "relationships": [
            { "type": "bla", "startNode": 3510983, "endNode": 3510982, "properties": {} },
            { "type": "bla", "startNode": 3510984, "endNode": 3510982, "properties": {} },
    ....
        ]
    }
}

Is is similar to the one proposed here: How can I restore data from a previous result in the browser?.

By looking at the answer. I discovered that I can use

CALL apoc.load.json("file:///test.json") YIELD value AS row
WITH row, row.graph.nodes AS nodes
UNWIND nodes AS node
CALL apoc.create.node(node.labels, node.properties) YIELD node AS n
SET n.id = node.id

and then

CALL apoc.load.json("file:///test.json") YIELD value AS row
with row
UNWIND row.graph.relationships AS rel
MATCH (a) WHERE a.id = rel.endNode
MATCH (b) WHERE b.id = rel.startNode
CALL apoc.create.relationship(a, rel.type, rel.properties, b) YIELD rel AS r
return *

(I have to do it in two times because else their are relation duplication due to the two unwind).

But this is super slow because I have a lot of entities and I suspect the program to search over all of them for each relation.

At the same time, I know "startNode": 3510983 refers to a node. So the question: does it exists anyway to speed up to import process using ids as index, or something else?

Note that my nodes have differents types. So I did not find a way to create an index for all of them, and I suppose that would be too huge (memory)

[MY SOLUTION - not efficient answer 1]

CALL apoc.load.json('file:///test.json') YIELD value
WITH value.graph.nodes AS nodes, value.graph.relationships AS rels
UNWIND nodes AS n
CALL apoc.create.node(n.labels, apoc.map.setKey(n.properties, 'id', n.id)) YIELD node
WITH rels, COLLECT({id: n.id, node: node, labels:labels(node)}) AS nMap
UNWIND rels AS r
MATCH (w{id:r.startNode})
MATCH (y{id:r.endNode})
CALL apoc.create.relationship(w, r.type, r.properties, y) YIELD rel
RETURN rel

[Final Solution in comment]


Solution

  • The final answer that seems also efficient is the following one:

    It is inspired by the solution and discussion of this answer https://stackoverflow.com/a/61464839/5257140

    Major updates are:

    CALL apoc.load.json('file:///test-graph.json') YIELD value
    WITH value.nodes AS nodes, value.relationships AS rels
    UNWIND nodes AS n
    CALL apoc.create.node(n.labels, apoc.map.setKey(n.properties, 'id', n.id)) YIELD node
    WITH rels, apoc.map.fromPairs(COLLECT([n.id, node])) AS nMap
    UNWIND rels AS r
    WITH r, nMap[TOSTRING(r.startNode)] AS startNode, nMap[TOSTRING(r.endNode)] AS endNode
    WHERE startNode IS NOT NULL and endNode IS NOT NULL 
    CALL apoc.create.relationship(startNode, r.type, r.properties, endNode) YIELD rel
    RETURN rel