chartsneo4jcypheropencypher

cypher query- how to calculate number of person clusters in cypher


In neo4j cypher query, I have person node, and person know each other. how can I find out how many person clusters? additionally, how can I find how many cluster has more than one person?

MATCH (p1:person)-[:know*]->(p2:person)
WHERE id(p1) < id(p2)
WITH collect(DISTINCT p1) + collect(DISTINCT p2) AS groupNodes
UNWIND groupNodes AS node
WITH DISTINCT node AS person, groupNodes
RETURN count(DISTINCT groupNodes) AS numberOfGroups;

I tried to use the query as above, it uses too much memory as it does not return results. wonder if there is any easy way to do it?


Solution

  • You can use the weakly connected components algorithm of the neo4j Graph Data Science library, which must be installed on the neo4j server.

    As an example, let's first create some sample data:

    CREATE
      (nAlice:Person {name: 'Alice'}),
      (nBridget:Person {name: 'Bridget'}),
      (nCharles:Person {name: 'Charles'}),
      (nDoug:Person {name: 'Doug'}),
      (nMark:Person {name: 'Mark'}),
      (nMichael:Person {name: 'Michael'}),
      (nStan:Person {name: 'Stan'}),
    
      (nAlice)-[:KNOWS]->(nBridget),
      (nAlice)-[:KNOWS]->(nCharles),
      (nMark)-[:KNOWS]->(nDoug),
      (nMark)-[:KNOWS]->(nMichael)
    

    Then, create a GDS projection (an in-memory copy of the specified nodes and relationships, for use by GDS algorithms):

    CALL gds.graph.project(
      'myGraph',
      'Person',
      'KNOWS'
    )
    

    And finally, run the WCC algorithm to detect communities of nodes that are connected by the KNOWS relationship, and return how many communities there are (note: a Person not connected to anyone else has his/her own community consisting of 1 person).

    CALL gds.wcc.stream('myGraph') YIELD componentId
    RETURN COUNT(DISTINCT componentId) AS numberOfGroups
    

    The result is:

    ╒══════════════╕
    │numberOfGroups│
    ╞══════════════╡
    │3             │
    └──────────────┘
    

    To see how many communities there are with at least 2 people:

    CALL gds.wcc.stream('myGraph') YIELD componentId
    WITH componentId, COUNT(*) AS nPeople
    WHERE nPeople > 1
    RETURN COUNT(componentId) AS numberOfGroupsWithMultiplePeople
    

    which gets the result:

    ╒════════════════════════════════╕
    │numberOfGroupsWithMultiplePeople│
    ╞════════════════════════════════╡
    │2                               │
    └────────────────────────────────┘
    

    There is also a way to estimate the memory usage.

    You should also drop the projection when you are done with it to free up memory.