sparqlgraphdbnamed-graphs

using contexts for multiple data sets in graphdb repo


I am working on research project in which are studying the success of tooth restoration procedures (i.e., fillings). We are collecting data from a number dental practices, and we are going to aggregate all the data into a GraphDB triple store. My question has to do with how to use GraphDB contexts in order to store all the data in a single repo, but still be able to query each practice individually when needed. I am using the Free edition of GraphDB, version 7.0.3, and the GraphDB workbench.

When I import data in the repo, it gives me the option to specify a context. My understanding is that this is essentially a subgraph of the whole RDF graph. Right? But I am a little unsure as to how this differs from the base URI. In the example page, both the base URI and context are the same.

My general thought on how to set the repo up is to give it a base URI, and keep the base URI the same for each practice data set, but change the context when loading each practice. For example:

... and so on ...

To query the aggregate of all the data (I suppose), I would use a SPARQL query that doesn't specify a graph. For example, to find all patients:

select ?patient where { ?patient rdf:type :Patient }

But, how would I query a particular practice? Would I specify a graph or use the "from" keywork. For example:

select ?patient from <practice-1> where { ?patient rdf:type :Patient }

or

select ?patient where { graph <practice-1> { ?patient rdf:type :Patient } }

Finally, does anyone know where there a page/documentation explaining how to effectively use a context?


Solution

  • You could use either approach for querying specific graphs. Both the GRAPH keyword and the FROM keyword do roughly the same thing in this case. However, using the FROM-variant is possibly faster, since it is a little easier to optimize this variant for the query planner.

    Some background: in SPARQL, the FROM (and FROM NAMED) clause specify the dataset over which a query ranges, while the GRAPH keyword simply "zooms in" on a subset of the currently queried dataset. If the FROM clause is left out, the query is evaluated over the database's default dataset. In GraphDB, the default SPARQL dataset includes all named graphs available in the database - which is why in this case using the GRAPH keyword and the FROM keyword do the same thing. Note that this is store-specific though: other databases can and do choose to define the default dataset differently.

    As an aside: none of this has anything to do with the base URI. A base URI is simply a syntactical mechanism used when resolving relative URI references in your input data. RDF databases like GraphDB don't actually store relative URIs, so the base URI is used by the parser to turn any relative URIs in your data into absolute ones, before adding it to the database.

    For further reading, I'd recommend GraphDB's own documentation about query behaviour. There's also a section about named graphs in the RDF4J Programming documentation - GraphDB is closely linked with the RDF4J APIs so it follows most of its conventions.