I have the following Cypher query that returns the country code along with the count of profiles in each country:
MATCH (p:Profile {active: true})-[:LOCATED_IN]->(loc:Location)
OPTIONAL MATCH path1 = (loc)-[:CONTAINS*0..]->(country:Location)
WHERE country.countryCode IS NOT NULL
WITH p, loc, country, path1
ORDER BY length(path1) ASC
WITH p, loc, head(collect(country)) AS nearestCountry
OPTIONAL MATCH path2 = (reverseCountry:Location)-[:CONTAINS*0..]->(loc)
WHERE reverseCountry.countryCode IS NOT NULL AND nearestCountry IS NULL
WITH p, coalesce(nearestCountry, reverseCountry) AS finalCountry
WHERE finalCountry IS NOT NULL
RETURN finalCountry.countryCode AS countryCode, COUNT(p) AS totalCount
ORDER BY totalCount DESC
Each Location
node can have an unlimited nesting of other Locations
through the CONTAINS
relationship. In this chain, either above or below the Location
, there is only one Location
node with a NOT NULL countryCode
. It seems that I find this in my query. But how can I also consider whether the loc
itself is a country, meaning it has a countryCode IS NOT NULL
, and then bypass those two checks with :CONTAINS*0..
in both directions and use the loc
itself in the final count?
[UPDATED]
You can use CALL subqueries containing UNION
ed non-overlapping queries. To optimize performance (by avoiding unnecessary path traversals), my answer uses nested CALL subqueries.
This first query shows the syntax to use prior to neo4j 5.24, which requires you to "import" variables from outside the CALL subquery using an initial WITH
clause. As an optimization, LIMIT 1
is used to avoid uselessly evaluating more paths after the only-possible countryCode
has already been found. And the nested CALL subquery allows us to avoid calling MATCH (country2:Location)-[:CONTAINS*]->(loc)
when country1
already provides a countryCode
.
MATCH (p:Profile {active: true})-[:LOCATED_IN]->(loc:Location)
CALL {
WITH loc
WITH loc
WHERE loc.countryCode IS NOT NULL
RETURN loc.countryCode AS countryCode
UNION
WITH loc
WITH loc
WHERE loc.countryCode IS NULL
OPTIONAL MATCH (loc)-[:CONTAINS*]->(country1:Location)
WITH loc, country1
LIMIT 1
CALL {
WITH loc, country1
WITH loc, country1
WHERE country1.countryCode IS NOT NULL
RETURN country1.countryCode AS countryCode
UNION
WITH loc, country1
WITH loc, country1
WHERE country1.countryCode IS NULL
MATCH (country2:Location)-[:CONTAINS*]->(loc)
RETURN country2.countryCode AS countryCode
LIMIT 1
}
RETURN countryCode
}
RETURN countryCode, COUNT(p) AS totalCount
ORDER BY totalCount DESC
Starting with neo4j 5.24, the variable-import syntax is deprecated, and you can use the new "optional call" syntax:
MATCH (p:Profile {active: true})-[:LOCATED_IN]->(loc:Location)
CALL (loc) {
WITH loc
WHERE loc.countryCode IS NOT NULL
RETURN loc.countryCode AS countryCode
UNION
WITH loc
WHERE loc.countryCode IS NULL
OPTIONAL MATCH (loc)-[:CONTAINS*]->(country1:Location)
WITH loc, country1
LIMIT 1
CALL (loc, country1) {
WITH loc, country1
WHERE country1.countryCode IS NOT NULL
RETURN country1.countryCode AS countryCode
UNION
WITH loc, country1
WHERE country1.countryCode IS NULL
MATCH (country2:Location)-[:CONTAINS*]->(loc)
RETURN country2.countryCode AS countryCode
LIMIT 1
}
RETURN countryCode
}
RETURN countryCode, COUNT(p) AS totalCount
ORDER BY totalCount DESC
Note that variable-length paths with no upper bound on the length [like (loc)-[:CONTAINS*]->(country:Location)
] are bad practice due to the possibility of performance and instability issues - unless you are sure that your paths are always going to be pretty short. So you may want to impose a reasonable upper bound on such path patterns [like (loc)-[:CONTAINS*..7]->(country:Location)
]