neo4jcypherneo4j-apoc

Noe4j : how to get property names with count of non-null or non-empty value for a given node label


So there is a requirement of counting the number of occurrences of non null and non empty values of all properties for a given node label.

For example,

MATCH(p:Person) 
WHERE p.name IS NOT NULL and p.name <> "" 
RETURN COUNT(p.name);

This gives me a count for name properties for node label Person I have to get the count of such other 200 properties for this node label. I am able to get the count of the properties but I am not sure for conditional count like the above. Anything would be helpful as I have just started learning neo4j.

The query I am using for all properties count is

MATCH(p:Person)
WITH p
UNWIND keys(p) as key
WITH key, COUNT(keys(p)) as cnt
RETURN DISTINCT key, cnt
ORDER BY key

What needs to be modified in the query? Or my query is wrong to start with. Thanks in advance.


Solution

  • If you are inferring the possible properties from the Person nodes in the MATCH, then you can get the count of non-empty properties with the following:

    MATCH (p:Person)
    UNWIND keys(p) AS key
    WITH key, CASE WHEN p[key] <> "" THEN 1 ELSE 0 END AS isNonEmpty
    RETURN key, sum(isNonEmpty) AS cnt
    ORDER BY key
    

    Note that because Neo4j treats null properties as non-existent, then if none of the nodes returned by the MATCH statement has the property, those properties can't be inferred.

    The CASE can easily be adapted to exclude other property values from the count. For example, to exclude empty lists, you can write:

    MATCH (p:Person)
    UNWIND keys(p) AS key
    WITH key, 
         CASE p[key]
           WHEN [] THEN 0 
           WHEN "" THEN 0
           WHEN 0 THEN 0
           ELSE 1 
         END AS isNonEmpty
    RETURN key, sum(isNonEmpty) AS cnt
    ORDER BY key
    

    If you know the properties in advance because, for example, you know the schema, then you could provide them as a list or set of records. For example, if the properties were 'a' through to 'g', you could use the following:

    UNWIND ['a', 'b', 'c', 'd', 'e', 'f', 'g'] AS key
    MATCH (p)
    WITH key, CASE WHEN p[key] <> "" THEN 1 ELSE 0 END AS isNonEmpty
    RETURN key, sum(isNonEmpty) AS cnt
    ORDER BY key