Considering a tree with the root node "A" and "hasChild" relationships (e.g. product structure) as following:
Target is to find out: What nodes have parents outside of the tree?
In this case the answer should be ‘B’ and ‘Q’ as they have parents outside of the tree.
The query should go to every node and check its parents rather than creating a list of children and the check every one of them I guess.
How can I efficiently (should work for millions of nodes) traverse this tree an SPARQL and answer this?
This is what i tried but gave 0 results:
PREFIX xxx: <http://example.org/xxx#>
select * where {
xxx:A xxx:hasChild* ?child .
?child ^xxx:hasChild ?foreignParent .
?child ^xxx:hasChild ?parent .
FILTER (?parent =! ?foreignParent) .
}
Attached the respective sample data:
<?xml version="1.0"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:xxx="http://example.org/xxx#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xml:base="http://example.org/xxx">
<owl:Ontology rdf:about="">
<owl:versionInfo>Created with TopBraid Composer</owl:versionInfo>
</owl:Ontology>
<owl:Class rdf:ID="Other">
<rdfs:label>Other</rdfs:label>
<rdfs:subClassOf rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
</owl:Class>
<owl:Class rdf:ID="Item">
<rdfs:label>Item</rdfs:label>
<rdfs:subClassOf rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
</owl:Class>
<rdf:Property rdf:ID="hasChild">
<rdfs:range rdf:resource="#Item"/>
<rdfs:range rdf:resource="#Other"/>
<rdfs:domain rdf:resource="#Item"/>
<rdfs:label>has child</rdfs:label>
</rdf:Property>
<xxx:Other rdf:ID="Fake_1">
<xxx:hasChild>
<xxx:Item rdf:ID="B">
<xxx:hasChild>
<xxx:Item rdf:ID="D">
<xxx:hasChild>
<xxx:Item rdf:ID="F"/>
</xxx:hasChild>
<xxx:hasChild>
<xxx:Item rdf:ID="E"/>
</xxx:hasChild>
</xxx:Item>
</xxx:hasChild>
<xxx:hasChild>
<xxx:Item rdf:ID="C"/>
</xxx:hasChild>
</xxx:Item>
</xxx:hasChild>
<rdfs:label>Fake 1</rdfs:label>
</xxx:Other>
<xxx:Other rdf:ID="Fake_2">
<xxx:hasChild>
<xxx:Item rdf:ID="Q"/>
</xxx:hasChild>
<rdfs:label>Fake 2</rdfs:label>
</xxx:Other>
<xxx:Item rdf:ID="A">
<xxx:hasChild>
<xxx:Item rdf:ID="G">
<xxx:hasChild>
<xxx:Item rdf:ID="X">
<xxx:hasChild>
<xxx:Item rdf:ID="Z"/>
</xxx:hasChild>
<xxx:hasChild>
<xxx:Item rdf:ID="Y"/>
</xxx:hasChild>
</xxx:Item>
</xxx:hasChild>
<xxx:hasChild>
<xxx:Item rdf:ID="R">
<xxx:hasChild>
<xxx:Item rdf:ID="W"/>
</xxx:hasChild>
<xxx:hasChild>
<xxx:Item rdf:ID="S">
<xxx:hasChild>
<xxx:Item rdf:ID="V"/>
</xxx:hasChild>
<xxx:hasChild>
<xxx:Item rdf:ID="U"/>
</xxx:hasChild>
<xxx:hasChild>
<xxx:Item rdf:ID="T"/>
</xxx:hasChild>
</xxx:Item>
</xxx:hasChild>
</xxx:Item>
</xxx:hasChild>
<xxx:hasChild>
<xxx:Item rdf:ID="M">
<xxx:hasChild rdf:resource="#Q"/>
<xxx:hasChild>
<xxx:Item rdf:ID="P"/>
</xxx:hasChild>
<xxx:hasChild>
<xxx:Item rdf:ID="O"/>
</xxx:hasChild>
<xxx:hasChild>
<xxx:Item rdf:ID="N"/>
</xxx:hasChild>
</xxx:Item>
</xxx:hasChild>
<xxx:hasChild>
<xxx:Item rdf:ID="H">
<xxx:hasChild>
<xxx:Item rdf:ID="L"/>
</xxx:hasChild>
<xxx:hasChild>
<xxx:Item rdf:ID="K"/>
</xxx:hasChild>
<xxx:hasChild>
<xxx:Item rdf:ID="J"/>
</xxx:hasChild>
<xxx:hasChild>
<xxx:Item rdf:ID="I"/>
</xxx:hasChild>
</xxx:Item>
</xxx:hasChild>
</xxx:Item>
</xxx:hasChild>
<xxx:hasChild rdf:resource="#B"/>
</xxx:Item>
</rdf:RDF>
The trick is to make sure there is no path from your tree root to your foreign parent node. You can do that by means of a FILTER NOT EXISTS
construction, like so:
PREFIX xxx: <http://example.org/xxx#>
SELECT ?child ?foreignParent
WHERE {
xxx:A xxx:hasChild+ ?child.
?child ^xxx:hasChild ?foreignParent.
FILTER NOT EXISTS { xxx:A xxx:hasChild* ?foreignParent }
}
Whether this scales to "millions of nodes" will depend on a) the depth of the tree and b) the triplestore you use. I ran the query using RDF4J on my laptop with the test data you provided, and got this:
Evaluating SPARQL query...
+-------------------------------------+-------------------------------------+
| child | foreignParent |
+-------------------------------------+-------------------------------------+
| <http://example.org/xxx#B> | <http://example.org/xxx#Fake_1> |
| <http://example.org/xxx#Q> | <http://example.org/xxx#Fake_2> |
+-------------------------------------+-------------------------------------+
2 result(s) (19 ms)