neo4jcypherneo4j-apocknowledge-graph

Is there a better approach to import a spreadsheet with multiple columns that may not have data for every row into Neo4j? Data set image included


Objectives - To add each values in the columns as a node. Each column represents a label type. The colored column is a property of Field column.

The query I used to ingest this into Neo4j Aura is:

LOAD CSV WITH HEADERS FROM "https://docs.google.com/spreadsheets/d/e/2PACX-1vTlOK8lOIzMR1E6YB-KDqMwsCSSrd/pub?output=csv" AS line
MERGE (m:Module {name: line.Module}) 
WITH m, line
MERGE (m)-[:CONTAINS_SUBMODULE]->(s:SubModule {name: line.SubModule})
WITH s, line
MERGE (s)-[:CONTAINS_MENU]->(m:Menu {name: line.Menu})
WITH m, line
WHERE line.SubMenu IS NOT NULL
MERGE (m)-[:CONTAINS_SUB_MENU]->(sm:SubMenu{name:line.SubMenu})
WITH sm, line
WHERE line.Screen IS NOT NULL
MERGE (sm)-[:LAUNCHES]->(s:Screen{name:line})
WITH s, line
WHERE line.Panel IS NOT NULL
MERGE (s)-[:CONTAINS_PANEL]->(p:Panel{name:line})
WITH p, line
WHERE line.SubScreen IS NOT NULL
MERGE (p)-[:CONTAINS_SUBSCREEN]->(ss:SubScreen{name:line})
WITH ss, line
WHERE line.Field IS NOT NULL
MERGE (ss)-[:CONTAINS_FIELD]->(f:Field{name:line})
WITH f, line
WHERE line.Button IS NOT NULL
MERGE (f)-[:CONTAINS_BUTTON]->(b:Button{name:line})

It worked fine till I attempted to map the SubMenu with the Screen column. It threw the error:

Property values can only be of primitive types or arrays thereof. Encountered: Map{Panel -> String("Search"), Menu -> String("Block Status"), SubModule -> String("Booking"), SubMenu -> String("Status Codes"), Button -> NO_VALUE, Field -> NO_VALUE, SubScreen -> NO_VALUE, Mandatory Field -> NO_VALUE, Screen -> String("Status Codes"), NodeID -> String("115"), Module -> String("Administration")}.

Is there a more efficient way to add this spreadsheet into Neo4j Aura?


Solution

  • You need to load the data one at a time because neo4j does not allow null values when it creates a node for SubScreen, Panel, Field and Button. Also, neo4j is doing an "eager operator" (google this) when you are trying to load all nodes into one command. For large files, you should divide it into separate load commands such as below. Execute each of the load command one at a time.

    // load modules, submodule, menu and submenu
    LOAD CSV WITH HEADERS FROM "file:///Test.csv" AS line
    MERGE (m:Module {name: line.Module}) 
    WITH m, line
    MERGE (m)-[:CONTAINS_SUBMODULE]->(s:SubModule {name: line.SubModule})
    WITH s, line
    MERGE (s)-[:CONTAINS_MENU]->(m:Menu {name: line.Menu})
    WITH m, line
    WHERE line.SubMenu IS NOT NULL
    MERGE (m)-[:CONTAINS_SUB_MENU]->(sm:SubMenu{name:line.SubMenu})
    WITH sm, line
    WHERE line.Screen IS NOT NULL
    MERGE (sm)-[:LAUNCHES]->(s:Screen{name:line.Screen})
    
    //Load for Panel 
    LOAD CSV WITH HEADERS FROM "file:///Test.csv" AS line
    WITH line WHERE line.Screen is NOT NULL AND line.Panel is NOT NULL
    MERGE (s:Screen {name: line.Screen}) 
    MERGE (s)-[:CONTAINS_PANEL]->(p:Panel{name:line.Panel})
    
    // Load for subscreen
    LOAD CSV WITH HEADERS FROM "file:///Test.csv" AS line
    WITH line WHERE line.Panel is NOT NULL AND line.SubScreen is NOT NULL
    MERGE (p:Panel {name: line.Panel})
    MERGE (p)-[:CONTAINS_SUBSCREEN]->(ss:SubScreen{name:line.SubScreen})
    
    // Load for field
    LOAD CSV WITH HEADERS FROM "file:///Test.csv" AS line
    WITH line WHERE line.SubScreen is NOT NULL and line.Field is NOT NULL
    MERGE (ss:SubScreen {name: line.SubScreen})
    MERGE (ss)-[:CONTAINS_FIELD]->(f:Field{name:line.Field})
    
    // Load for buttons
    LOAD CSV WITH HEADERS FROM "file:///Test.csv" AS line
    WITH line WHERE line.Field IS NOT NULL and line.Button is NOT NULL
    MERGE (f:Field {name: line.Field})
    MERGE (f)-[:CONTAINS_BUTTON]->(b:Button{name:line.Button})
    

    result: enter image description here