sparqlsemantic-weblinked-datatriplesturtle-rdf

SPARQL algebra: Excluding nodes based on triples they have


Take this graph:

:thing1 a :Foo ;
    :has :A ;
    :has :B .

:thing2 a :Foo ;
    :has :B ;
    :has :A .

:thing3 a :Foo ;
    :has :A ;
    :has :B ;
    :has :C .

I want to select :thing1 and :thing2, but NOT :thing3.

Here is the SPARQL query I wrote that works. Is there a better way to do this?

SELECT ?foo WHERE {
    ?foo a :Foo ;
        :has :A ;
        :has :B .
    MINUS {
        ?foo a :Foo ;
            :has :A ;
            :has :B ;
            :has ?anythingElse .
        FILTER(?anythingElse != :A && ?anythingElse != :B)
    }
}

Solution

  • An alternative to MINUS is FILTER NOT EXISTS:

    SELECT ?foo WHERE {
        ?foo a :Foo ;
            :has :A, :B .
       FILTER NOT EXISTS {
           ?foo :has ?other .
           FILTER (?other NOT IN (:A, :B))
        }
    }
    

    which says, loosely, find all ?foo with :A and :B, then check that they have no other :has value.

    In terms of execution efficiency, there are optimizations to turn some MINUS patterns into FILTER NOT EXISTS and vice versa and also there is the possibility of shared common sub patterns.

    Without an optimizer being that smart, the FILTER NOT EXISTS is likely to be faster because the "?foo a :Foo ; ;has :A, :B ." is not repeated and the FILTER only considers items that already passed the "?foo a :Foo ; ;has :A, :B .".

    There is only one way to know which is to try for real on real data when all effects, including caching, come together.