The SPARQL specification mentions that the FROM
clause can be used to specify a dataset.
A SPARQL query may specify the dataset to be used for matching by using the
FROM
clause and theFROM NAMED
clause to describe the RDF dataset.
What is a "dataset" in the context of SPARQL? I'm very familiar with databases in general, and I understand in principle that a query for data phrased in a language such as SQL is then executed against a dataset to resolve some subset of that dataset.
I'm trying to understand the following query:
prefix cpmeta: <...some_domain>
select distinct
?uri
?label
?stationId
from <...some_domain>
from <...some_domain>
from <...some_domain>
from <...some_domain>
from named <...some_domain>
where {
{ ?uri rdfs:label ?label }
UNION
{ ?uri cpmeta:hasName ?label }
UNION
{
graph <...some_domain> {
?uri a cpmeta:Station .
?uri cpmeta:hasName ?label .
}
}
?uri cpmeta:hasStationId ?stationId
}
limit 100
So from the specification documentation I understand in principle that
However. The query actually executes (but with slightly different results) if I leave out the FROM
and FROM NAMED
clauses:
prefix cpmeta: <...some_domain>
select distinct
?uri
?label
?stationId
where {
{ ?uri rdfs:label ?label }
UNION
{ ?uri cpmeta:hasName ?label }
UNION
{
graph <...some_domain> {
?uri a cpmeta:Station .
?uri cpmeta:hasName ?label .
}
}
?uri cpmeta:hasStationId ?stationId
}
limit 100
So clearly??? there is already a dataset specified. Is that via the prefix
?
Questions:
RDF dataset
identified differently to a regular dataset (FROM
vs FROM NAMED
)FROM
statement. What is the difference between a prefix and a FROM
clause?This question - Specifying dataset within a SPARQL query - shows how to specify a dataset, but doesn't explain what that means in the context of a SPARQL query and in the context of however that SPARQL query is resolved to actual data.
This question - FROM clause in SPARQL queries - mentions that a SPARQL query without a FROM clause is executed against a default dataset. But then why would omitting all datasets still result in data returned by the query?
Comparing the execution of a SPARQL query with SQL queries is a bit tricky. SPARQL is more high level.
Datasets
An endpoint (e.g. a database like Virtuoso, GraphDB) has some freedom (not) to implement SPARQL concepts.
The dataset is such a concept. Usually a graph database allows you to create a repository which is equivalent to a database in the SQL world. Inside this triples are stored, and these triples can be grouped in named graphs. The GRAPH
construct helps you te select which set to look in.
The repository is the dataset you are referring to.
Very few databases support querying datasets/repositories that are not hosted in that same database. For very obvious reasons.
SPARQL
The less precise your query, the more data it is matched to. Using the GRAPH <...> {}
can narrow down the sets to match some triples to without the need to specify a full sub query
Don't confuse datasets with namespaces. The ID's in the world of RDF are always a URI's. The first part of a URI usually mentions the organisation that invented the ID. But still, they are just the ID. Using prefixes makes the ID look shorter.
You could put each triple in a separate graph, which turns the name of the graph into an identifier of the triple. This is not intended, but also not forbidden usage.