neo4jcyphertournament

Remove redundant two way relationships in a Neo4j graph


I have a simple model of a chess tournament. It has 5 players playing each other. The graph looks like this:

enter image description here

The graph is generally fine, but upon further inspection, you can see that both sets
Guy1 vs Guy2,
and
Guy4 vs Guy5
have a redundant relationship each.

The problem is obviously in the data, where there is a extraneous complementary row for each of these matches (so in a sense this is a data quality issue in the underlying csv):

enter image description here

I could clean these rows by hand, but the real dataset has millions of rows. So I'm wondering how I could remove these relationships in either of 2 ways, using CQL:

1) Don't read in the extra relationship in the first place

2) Go ahead and create the extra relationship, but then remove it later.

Thanks in advance for any advice on this.

The code I'm using is this:

/ Here, we load and create nodes

LOAD CSV WITH HEADERS FROM
'file:///.../chess_nodes.csv' AS line
WITH line
MERGE (p:Player {
  player_id: line.player_id
})

ON CREATE SET p.name = line.name
ON MATCH SET p.name = line.name

ON CREATE SET p.residence = line.residence
ON MATCH SET p.residence = line.residence

// Here create the edges

LOAD CSV WITH HEADERS FROM
'file:///.../chess_edges.csv' AS line
WITH line
MATCH (p1:Player {player_id: line.player1_id})
WITH p1, line
OPTIONAL MATCH (p2:Player {player_id: line.player2_id})
WITH p1, p2, line
MERGE (p1)-[:VERSUS]->(p2)

Solution

  • It is obvious that you don't need this extra relationship as it doesn't add any value nor weight to the graph.

    There is something that few people are aware of, despite being in the documentation.

    MERGE can be used on undirected relationships, neo4j will pick one direction for you (as realtionships MUST be directed in the graph).

    Documentation reference : http://neo4j.com/docs/stable/query-merge.html#merge-merge-on-an-undirected-relationship

    An example with the following statement, if you run it for the first time :

    MATCH (a:User {name:'A'}), (b:User {name:'B'}) 
    MERGE (a)-[:VERSUS]-(b)
    

    It will create the relationship as it doesn't exist. However if you run it a second time, nothing will be changed nor created.

    I guess it would solve your problem as you will not have to worry about cleaning the data in upfront nor run scripts afterwards for cleaning your graph.