sparqlrdfamazon-neptune

Limiting cardinality of predicates in SPARQL queries


I'm trying to create a query to match what we call the "shape" of a node in our rdf data. One of the things that goes into this shape is a cardinality for each type of predicate that can link to it.

As a simple example, let's imagine our data is something like this:

PREFIX mynamespace: <https://example.com/namespace/>

mynamespace:node1 mynamespace:hasFoo mynamespace:foo1 ;
                  mynamespace:hasFoo mynamespace:foo2 .

mynamespace:node2 mynamespace:hasFoo mynamespace:foo1 ;
                  mynamespace:hasFoo mynamespace:foo2 ;
                  mynamespace:hasFoo mynamespace:foo3 .

Now, our shapes are defined in some scala code so i'll represent it with pseudocode here:

trait NodeShape {
    val predicateType: String //this will have the value "hasFoo" for our purposes
    val predicateMinCardinality: Option[Int] //This will be the minimum number of hasFoo predicates
    val predicateMaxCardinality: Option[Int] //This will be the maximum number of hasFooPredicates
}

Now obviously there's a lot more to the graph pattern than just this and there may be mulltiple predicate types that require restriction in this way but for now I'm focused on accomplishing this for just one predicate. I haven't been able to find anything in the spec that clearly accomplishes this for me. Basically I need to be able to make a query that will match node1 without matching node2 and vice versa.


Solution

  • You can do this in SPARQL using GROUP BY and HAVING to filter nodes based on how many times a given predicate appears. For example:

    PREFIX mynamespace: <https://example.com/namespace/>
    
    SELECT ?node (COUNT(?foo) AS ?fooCount)
    WHERE {
      ?node mynamespace:hasFoo ?foo .
    }
    GROUP BY ?node
    HAVING (?fooCount = 2)
    

    This will return only the nodes that have exactly two mynamespace:hasFoo predicates.
    If you want a range instead, you can do something like:

    HAVING (?fooCount >= 2 && ?fooCount <= 3)
    

    That way you can easily control the minimum and maximum cardinality for each predicate.