I am trying to download a number of named graphs from my GraphDB repository using the API. Simply put, I want to retrieve the entire content of a named graph, in a serialization of choice (Turtle and JSON-LD).
My first approach is using a construct query:
PREFIX ex: <http://example.com/ns#>
CONSTRUCT {?s ?p ?o}
WHERE {graph <http://example.com/ns#id/>
{?s ?p ?o .}}
I'm using the python SPARQLWrapper here, which is returning results, only these results contain triples from ALL my graphs, not just http://example.com/ns#id/. Furthermore, if I add some kind of filter like filter (?o = "weafsdghadifhilaerjgak") I still get the same result (all triples from all graphs), making me think that the query is not actually running.
I tried opening a fresh notebook with no possible contamination of the query variable, but I still get the same result.
Running the query in GraphDB workbench gives the expected result. Looking for pointers on why there could be a difference between the GraphDB workbench result, and the SPARQLWrapper result
Here is the code I'm using, minus some specifics for security sake:
from SPARQLWrapper import SPARQLWrapper, BASIC, QueryResult
from rdflib import Graph
db = SPARQLWrapper("myendpoint:7300/repositories/myrepo/statements")
query = '''
PREFIX ex: <http://example.com/ns#>
CONSTRUCT {?s ?p ?o}
WHERE {graph <http://example.com/ns#id/>
{?s ?p ?o .
#filter (?o = "somestringthatdefinitelydoesnotexist")
}}
'''
db.setHTTPAuth(BASIC)
db.setCredentials('my', 'credentials')
db.setQuery(query)
db.method = "GET"
db.setReturnFormat('json-ld')
db.queryType = "CONSTRUCT"
result = db.query()
jsonresult=(result._convertJSONLD())
# v = jsonresult.serialize(format='json-ld')
v = jsonresult.serialize(format='turtle')
print(query)
print(v)
When using SPARQLWrapper, the endpoint url for GET queries has to be structured as follows:
myendpoint:port/repositories/{repositoryID}
For POST queries, (updating, inserting), the endpoint is different:
myendpoint:port/repositories/{repositoryID}/statements
by running a GET query on a POST "endpoint url" will result in retrieving ALL triples, regardless of the query you attach. Hence also why my filter was not having any impact on the query results.
This is in my opinion a silly complexity: after all, GraphDB workbench itsself parses your query, figures out whether it's a GET or POST situation, and handles the api just fine.
TL:DR Use /statements ONLY when doing POST query (SPARQL Insert).
This is somewhat described in the REST API Documentation on GraphDB, but it was not clear that running the wrong "type" of query on an endpoint does not give an error, but rather just returns all statements in the repo.