I am trying to identity Subjects that have the exact same "set" of triples. In this example data, :Set2
should be identified as the only exact match with :Set1
, while :Set1
and :Set3
are NOT an exact match because of the value :VAL_E
.
@prefix : <https://www.example.org/Eg#>.
:Set1 :hasValue :VAL_A, :VAL_B, :VAL_C, :VAL_D .
:Set2 :hasValue :VAL_A, :VAL_B, :VAL_C, :VAL_D .
:Set3 :hasValue :VAL_A, :VAL_B, :VAL_C, :VAL_D, :VAL_E .
:Set4 :hasValue :VAL_A, :VAL_B .
:Set5 :hasValue :VAL_F, :VAL_G, :VAL_H, :VAL_I, :VAL_J .
I have found example SPARQL on StackOverflow that identifies individual triples that match between :Set1
and the other sets, and even the number of matches , but not how identify an exact match of a set of triples as a whole. I expect a combination of FILTER NOT EXISTS
and !SAMETERM
is needed, but I can't get the syntax correct.
UPDATE: I adapted the SPARQL from @StanislavKralin to find other Sets identical to :Set1. It almost works.
SELECT DISTINCT ?s2 {
:Set1 ?p ?o .
?s2 ?p ?o .
FILTER NOT EXISTS { :Set1 ?p1 ?o1 . FILTER NOT EXISTS { ?s2 ?p1 ?o1 } }
FILTER NOT EXISTS { ?s2 ?p2 ?o2 . FILTER NOT EXISTS { :Set1 ?p2 ?o2 } } # omits match from :Set3 to :Set1
FILTER (STR(:Set1) < STR(?s2))
}
However, the result of my query includes :Set4, which is incorrect.
:Set2
:Set4
What am I missing?
[Update] As noted in the comments below, Stanislav provided further explanation and code on the Stardog Community Forum: https://community.stardog.com/t/unexpected-sparql-filter-results/2745/14 , along with additional information from Pavel Klinov explaining Stardog's current behaviour. As you can read there, a ticket has been opened for resolution. Meanwhile, this code provided by Stanislav provides the correct result:
SELECT DISTINCT ?s1 ?s2 {
?s1 ?p ?o .
?s2 ?p ?o .
FILTER NOT EXISTS {
?s1 ?p ?o .
?s2 ?p ?o .
?s1 ?p1 ?o1 .
FILTER NOT EXISTS { ?s2 ?p1 ?o1 } }
FILTER NOT EXISTS {
?s1 ?p ?o .
?s2 ?p ?o .
?s2 ?p2 ?o2 .
FILTER NOT EXISTS { ?s1 ?p2 ?o2 } }
FILTER (STR(?s1) < STR(?s2))
}
Tested in Apache Jena Fuseki and Ontotext GraphDB:
SELECT DISTINCT ?s1 ?s2 {
?s1 ?p ?o .
?s2 ?p ?o .
FILTER NOT EXISTS { ?s1 ?p1 ?o1 . FILTER NOT EXISTS { ?s2 ?p1 ?o1 } }
FILTER NOT EXISTS { ?s2 ?p2 ?o2 . FILTER NOT EXISTS { ?s1 ?p2 ?o2 } }
FILTER (STR(?s1) < STR(?s2))
}
Explanation
Let S1
and S2
be the sets of triples having :s1
and :s2
as subjects respectively.
What does S1 ≡ S2
mean? That means that S1 ⊆ S2
and S2 ⊆ S1
.
What does S1 ⊆ S2
mean? That means that ∀x(x ∈ S1 → x ∈ S2)
.
Unfortunately, there is no something like ∀
('for all') in SPARQL.
However, one could write ¬∃x¬(x ∈ S1 → x ∈ S2)
instead and use SPARQL's NOT EXISTS
.
Finally, x ∈ S1
could be translated as :s1 ?p ?o
.
See also this answer.