neo4jcypher

Difference between "NOT v IN [a,b]" and "v<>a AND v<>b"


In Neo4j 5.26, the results of query 1 and 2 are different. Why is this? The only difference between 1 and 2 is AND ((t.id <> 1 AND t.id <> 3) and AND (NOT t.id IN [1,3]), which should be semantically the same.

  1. Query 1

    MATCH(b:Bottom)
    OPTIONAL MATCH(b:Bottom)<-[:BOTTOM]-(m:Middle)
    OPTIONAL MATCH(m:Middle)<-[:MIDDLE]-(t:Top)
    WITH DISTINCT b, m, t
    WHERE (
        m.id IN [21,22,31]
        AND (NOT m.id IN [22,32])
        AND (t.id <> 1 AND t.id <> 3)   // only this line differs 
    )
    RETURN LABELS(t)[0], t.id, LABELS(m)[0], m.id, LABELS(b)[0], b.id;
    

    Result:

    LABELS(t)[0]    t.id    LABELS(m)[0]    m.id    LABELS(b)[0]    b.id
    "Top"   2   "Middle"    21  "Bottom"    211
    "Top"   2   "Middle"    21  "Bottom"    212
    
  2. Query 2

    MATCH(b:Bottom)
    OPTIONAL MATCH(b:Bottom)<-[:BOTTOM]-(m:Middle)
    OPTIONAL MATCH(m:Middle)<-[:MIDDLE]-(t:Top)
    WITH DISTINCT b, m, t
    WHERE (
        m.id IN [21,22,31]
        AND (NOT m.id IN [22,32])
        AND (NOT t.id IN [1,3])         // only this line differs 
    )
    RETURN LABELS(t)[0], t.id, LABELS(m)[0], m.id, LABELS(b)[0], b.id;
    

    Result:

    (no changes, no records)
    

You can create test data with the following Cypher statement:

WITH
    RANGE(1,4) AS tops,
    RANGE(1,3) AS middles,
    RANGE(1,2) AS bottoms
FOREACH (top IN tops | 
    MERGE (t:Top{id:top})
    FOREACH (middle IN middles | 
        CREATE (m:Middle{id:top*10 + middle})
        MERGE (t)-[:MIDDLE]->(m)
        FOREACH (bottom IN bottoms |
            CREATE (b:Bottom{id:top*100 + middle*10 + bottom})
            MERGE (m)-[:BOTTOM]->(b)
        )
    )
);

Displaying test data:

MATCH(t:Top)-[r1]-(m:Middle)-[r2]-(b:Bottom)
RETURN *;

Deleting test data:

MATCH(n) WHERE (n:Top OR n:Middle OR n:Bottom)
DETACH DELETE n
RETURN n;

Solution

  • First of all, thank you for including a query with test data to reproduce the issue, that is very helpful.

    This seems to be a bug, those two (NOT t.id IN [1,3] and t.id <> 1 AND t.id <> 3) should result in the same thing. This is being investigated by the engineering team. I will post here when I hear something more.