sparqlrdfstardog

SPARQL: find subjects with the same set of triples?


I am trying to identity Subjects that have the exact same "set" of triples. In this example data, :Set2 should be identified as the only exact match with :Set1, while :Set1 and :Set3 are NOT an exact match because of the value :VAL_E.

@prefix :  <https://www.example.org/Eg#>.

:Set1 :hasValue  :VAL_A, :VAL_B, :VAL_C, :VAL_D .
:Set2 :hasValue  :VAL_A, :VAL_B, :VAL_C, :VAL_D .
:Set3 :hasValue  :VAL_A, :VAL_B, :VAL_C, :VAL_D, :VAL_E .
:Set4 :hasValue  :VAL_A, :VAL_B .
:Set5 :hasValue  :VAL_F, :VAL_G, :VAL_H, :VAL_I, :VAL_J .

I have found example SPARQL on StackOverflow that identifies individual triples that match between :Set1 and the other sets, and even the number of matches , but not how identify an exact match of a set of triples as a whole. I expect a combination of FILTER NOT EXISTS and !SAMETERM is needed, but I can't get the syntax correct.

UPDATE: I adapted the SPARQL from @StanislavKralin to find other Sets identical to :Set1. It almost works.

SELECT DISTINCT  ?s2 {
  :Set1 ?p ?o .
  ?s2 ?p ?o .
  FILTER NOT EXISTS { :Set1 ?p1 ?o1 . FILTER NOT EXISTS { ?s2 ?p1 ?o1 } }
  FILTER NOT EXISTS { ?s2 ?p2 ?o2 . FILTER NOT EXISTS { :Set1 ?p2 ?o2 } } # omits match from :Set3 to :Set1
  FILTER (STR(:Set1) < STR(?s2))
}

However, the result of my query includes :Set4, which is incorrect.

:Set2
:Set4

What am I missing?

[Update] As noted in the comments below, Stanislav provided further explanation and code on the Stardog Community Forum: https://community.stardog.com/t/unexpected-sparql-filter-results/2745/14 , along with additional information from Pavel Klinov explaining Stardog's current behaviour. As you can read there, a ticket has been opened for resolution. Meanwhile, this code provided by Stanislav provides the correct result:

SELECT DISTINCT ?s1 ?s2 {
  ?s1 ?p ?o .   
  ?s2 ?p ?o .
  FILTER NOT EXISTS {
    ?s1 ?p ?o . 
    ?s2 ?p ?o . 
    ?s1 ?p1 ?o1 . 
    FILTER NOT EXISTS { ?s2 ?p1 ?o1 } }
  FILTER NOT EXISTS {
    ?s1 ?p ?o . 
    ?s2 ?p ?o .
    ?s2 ?p2 ?o2 . 
    FILTER NOT EXISTS { ?s1 ?p2 ?o2 } }
  FILTER (STR(?s1) < STR(?s2))
}

Solution

  • Tested in Apache Jena Fuseki and Ontotext GraphDB:

    SELECT DISTINCT ?s1 ?s2 {
      ?s1 ?p ?o .
      ?s2 ?p ?o .
      FILTER NOT EXISTS { ?s1 ?p1 ?o1 . FILTER NOT EXISTS { ?s2 ?p1 ?o1 } }
      FILTER NOT EXISTS { ?s2 ?p2 ?o2 . FILTER NOT EXISTS { ?s1 ?p2 ?o2 } }
      FILTER (STR(?s1) < STR(?s2))
    }
    

    Explanation

    Let S1 and S2 be the sets of triples having :s1 and :s2 as subjects respectively.
    What does S1 ≡ S2 mean? That means that S1 ⊆ S2 and S2 ⊆ S1.
    What does S1 ⊆ S2 mean? That means that ∀x(x ∈ S1 → x ∈ S2).
    Unfortunately, there is no something like ('for all') in SPARQL.
    However, one could write ¬∃x¬(x ∈ S1 → x ∈ S2) instead and use SPARQL's NOT EXISTS.
    Finally, x ∈ S1 could be translated as :s1 ?p ?o.

    See also this answer.