neo4jcypher

Neo4j Cypher query for counting nodes in a list with a condition


I have the following Cypher query that returns the country code along with the count of profiles in each country:

MATCH (p:Profile {active: true})-[:LOCATED_IN]->(loc:Location)

OPTIONAL MATCH path1 = (loc)-[:CONTAINS*0..]->(country:Location)
WHERE country.countryCode IS NOT NULL
WITH p, loc, country, path1
ORDER BY length(path1) ASC
WITH p, loc, head(collect(country)) AS nearestCountry

OPTIONAL MATCH path2 = (reverseCountry:Location)-[:CONTAINS*0..]->(loc)
WHERE reverseCountry.countryCode IS NOT NULL AND nearestCountry IS NULL
WITH p, coalesce(nearestCountry, reverseCountry) AS finalCountry
WHERE finalCountry IS NOT NULL

RETURN finalCountry.countryCode AS countryCode, COUNT(p) AS totalCount
ORDER BY totalCount DESC

Each Location node can have an unlimited nesting of other Locations through the CONTAINS relationship. In this chain, either above or below the Location, there is only one Location node with a NOT NULL countryCode. It seems that I find this in my query. But how can I also consider whether the loc itself is a country, meaning it has a countryCode IS NOT NULL, and then bypass those two checks with :CONTAINS*0.. in both directions and use the loc itself in the final count?


Solution

  • [UPDATED]

    You can use CALL subqueries containing UNIONed non-overlapping queries. To optimize performance (by avoiding unnecessary path traversals), my answer uses nested CALL subqueries.

    This first query shows the syntax to use prior to neo4j 5.24, which requires you to "import" variables from outside the CALL subquery using an initial WITH clause. As an optimization, LIMIT 1 is used to avoid uselessly evaluating more paths after the only-possible countryCode has already been found. And the nested CALL subquery allows us to avoid calling MATCH (country2:Location)-[:CONTAINS*]->(loc) when country1 already provides a countryCode.

    MATCH (p:Profile {active: true})-[:LOCATED_IN]->(loc:Location)
    CALL {
      WITH loc
      WITH loc
      WHERE loc.countryCode IS NOT NULL
      RETURN loc.countryCode AS countryCode
    UNION
      WITH loc
      WITH loc
      WHERE loc.countryCode IS NULL
      OPTIONAL MATCH (loc)-[:CONTAINS*]->(country1:Location)
      WITH loc, country1
      LIMIT 1
      CALL {
        WITH loc, country1
        WITH loc, country1
        WHERE country1.countryCode IS NOT NULL
        RETURN country1.countryCode AS countryCode
      UNION
        WITH loc, country1
        WITH loc, country1
        WHERE country1.countryCode IS NULL
        MATCH (country2:Location)-[:CONTAINS*]->(loc)
        RETURN country2.countryCode AS countryCode
        LIMIT 1
      }
      RETURN countryCode
    }
    RETURN countryCode, COUNT(p) AS totalCount
    ORDER BY totalCount DESC
    

    Starting with neo4j 5.24, the variable-import syntax is deprecated, and you can use the new "optional call" syntax:

    MATCH (p:Profile {active: true})-[:LOCATED_IN]->(loc:Location)
    CALL (loc) {
      WITH loc
      WHERE loc.countryCode IS NOT NULL
      RETURN loc.countryCode AS countryCode
    UNION
      WITH loc
      WHERE loc.countryCode IS NULL
      OPTIONAL MATCH (loc)-[:CONTAINS*]->(country1:Location)
      WITH loc, country1
      LIMIT 1
      CALL (loc, country1) {
        WITH loc, country1
        WHERE country1.countryCode IS NOT NULL
        RETURN country1.countryCode AS countryCode
      UNION
        WITH loc, country1
        WHERE country1.countryCode IS NULL
        MATCH (country2:Location)-[:CONTAINS*]->(loc)
        RETURN country2.countryCode AS countryCode
        LIMIT 1
      }
      RETURN countryCode
    }
    RETURN countryCode, COUNT(p) AS totalCount
    ORDER BY totalCount DESC
    

    Note that variable-length paths with no upper bound on the length [like (loc)-[:CONTAINS*]->(country:Location)] are bad practice due to the possibility of performance and instability issues - unless you are sure that your paths are always going to be pretty short. So you may want to impose a reasonable upper bound on such path patterns [like (loc)-[:CONTAINS*..7]->(country:Location)]