variablesneo4jparametersimpact-analysis

Neo4j: How to pass a variable to Neo4j Apoc (apoc.path.subgraphAll) Property


Am new to Neo4j and trying to do a POC by implementing a graph DB for Enterprise Reference / Integration Architecture (Architecture showing all enterprise applications as Nodes, Underlying Tables / APIs - logically grouped as Nodes, integrations between Apps as Relationships.

Objective is to achieve seamlessly 'Impact Analysis' using the strength of Graph DB (Note: I understand this may be an incorrect approach to achieve whatever am trying to achieve, so suggestions are welcome)

Let me come brief my question now,

There are four Apps - A1, A2, A3, A4; A1 has set of Tables (represented by a node A1TS1) that's updated by Integration 1 (relationship in this case) and the same set of tables are read by Integration 2. So the Data model looks like below

(A1TS1)<-[:INT1]-(A1)<-[:INT1]-(A2)
(A1TS1)-[:INT2]->(A1)-[:INT2]->(A4)

I have the underlying application table names captured as a List property in A1TS1 node.

Let's say one of the app table is altered for a new column or Data type and I wanted to understand all impacted Integrations and Applications. Now am trying to write a query as below to retrieve all nodes & relationships that are associated/impacted because of this table alteration but am not able to achieve this

Expected Result is - all impacted nodes (A1TS1, A1, A2, A4) and relationships (INT1, INT2)

Option 1 (Using APOC)

MATCH (a {TCName:'A1TS1',AppName:'A1'})-[r]-(b) 
WITH a as STRTND, Collect(type(r)) as allr 
CALL apoc.path.subgraphAll(STRTND, {relationshipFilter:allr}) YIELD nodes, relationships 
RETURN nodes, relationships

This faile with error Failed to invoke procedure 'apoc.path.subgraphAll': Caused by: java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.String

Option 2 (Using with, unwind, collect clause)

MATCH (a {TCName:'A1TS1',AppName:'A1'})-[r]-(b) 
WITH a as STRTND, Collect(r) as allr 
UNWIND allr as rels 
MATCH p=()-[rels]-()-[rels]-() 
RETURN p

This fails with error "Cannot use the same relationship variable 'rels' for multiple patterns" but if I use the [rels] once like p=()-[rels]=() it works but not yielding me all nodes

Any help/suggestion/lead is appreciated. Thanks in advance

Update Trying to give more context Showing the Underlying Data

MATCH (TC:TBLCON) RETURN TC

"TC"
{"Tables":["TBL1","TBL2","TBL3"],"TCName":"A1TS1","AppName":"A1"}  
{"Tables":["TBL4","TBL1"],"TCName":"A2TS1","AppName":"A2"} 

MATCH (A:App) RETURN A

"A"                                                                   
{"Sponsor":"XY","Platform":"Oracle","TechOwnr":"VV","Version":"12","Tags":["ERP","OracleEBS","FinanceSystem"],"AppName":"A1"}               
{"Sponsor":"CC","Platform":"Teradata","TechOwnr":"RZ","Tags":["EDW","DataWarehouse"],"AppName":"A2"}   

MATCH ()-[r]-() RETURN distinct r.relname

"r.relname"
"FINREP"   │  (runs between A1 to other apps)
"UPFRNT"   │  (runs between A2 to different Salesforce App)
"INVOICE"  │  (runs between A1 to other apps)

With this, here is what am trying to achieve Assume "TBL3" is getting altered in App A1, I wanted to write a query specifying the table "TBL3" in match pattern, get all associated relationships and connected nodes (upstream)

May be I need to achieve in 3 steps, Step 1 - Write a match pattern to find the start node and associated relationship(s) Step 2 - Store that relationship(s) from step 1 in a Array variable / parameter Step 3 - Pass the start node from step 1 & parameter from step 2 to apoc.path.subgraphAll to see all the impacted nodes

This may conceptually sound valid but how to do that technically in neo4j Cypher query is the question.

Hope this helps


Solution

  • WIth some lead from cybersam's response, the below query gets me what I want. Only constraint is, this result is limited to 3 layers (3rd layer through Optional Match)

    MATCH (TC:TBLCON) WHERE 'TBL3' IN TC.Tables 
    CALL apoc.path.subgraphAll(TC, {maxLevel:1}) YIELD nodes AS invN, relationships AS invR
    WITH TC, REDUCE (tmpL=[], tmpr IN invR | tmpL+type(tmpr)) AS impR
    MATCH FLP=(TC)-[]-()-[FLR]-(SL) WHERE type(FLR) IN impR 
    WITH FLP, TC, SL,impR 
    OPTIONAL MATCH SLP=(SL)-[SLR]-() WHERE type(SLR) IN impR RETURN FLP,SLP
    

    This works for my needs, hope this might also help someone.

    Thanks everyone for the responses and suggestions

    ****Update****

    Enhanced the query to get rid of Optional Match criteria and other given limitations

    MATCH (initTC:TBLCON) WHERE $TL IN initTC.Tables 
    WITH Reduce(O="",OO in Reduce (I=[], II in collect(apoc.node.relationship.types(initTC)) | I+II) | O+OO+"|") as RF
    MATCH (TC:TBLCON) WHERE $TL IN TC.Tables 
    CALL apoc.path.subgraphAll(TC,{relationshipFilter:RF}) YIELD nodes, relationships
    RETURN nodes, relationships
    

    Thanks all (especially cybersam)