I'm quite puzzled by the behavior of my endpoint, nor by the processing of the request. The basic RDFS namespace seems to clash with another definition while querying, resulting in an error when declaring the prefix and a normal output when omitting the prefix in the body.
SELECT *
WHERE {
?sub rdfs:label ?p .
} LIMIT 5
Output 1:
INFO:root: sub p
0 http://example.org/triples/17bbab96 Pont d Iéna-9423efbc
1 http://example.org/triples/37d3fba1 Pont d Iéna-9423efbc
2 http://example.org/triples/e8a8921a Pont Transbordeur-fb62b01e
3 http://example.org/triples/7907d1de Pont Transbordeur-fb62b01e
4 http://example.org/triples/5b529b5e Pont d Iéna-98cdd2fc
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT *
WHERE {
?sub rdfs:label ?p .
} LIMIT 5
Output 2 (Client Side):
(...)
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request
(...)
ValueError: You did something wrong formulating either the URI or your SPARQL query
Output 2 (Server Side):
[INFO ] 2023-07-13 08:50:13,797 [repositories/astra1 | c.o.f.s.GraphDBProtocolExceptionResolver] X-Request-Id: 712a09f4-626e-5f2a-b22b-5d436e2c4ae2 Client sent bad request (400)
org.eclipse.rdf4j.http.server.ClientHTTPException: MALFORMED QUERY: Multiple prefix declarations for prefix 'rdfs'
import os
import logging
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
from dotenv import load_dotenv
load_dotenv()
from rdflib import Graph
from rdflib.plugins.stores import sparqlstore
from rdflib.plugins.sparql.processor import SPARQLResult
from requests.auth import HTTPDigestAuth
from pandas import DataFrame
def sparql_results_to_df(results: SPARQLResult) -> DataFrame:
"""
Export results from an rdflib SPARQL query into a `pandas.DataFrame`,
using Python types. See https://github.com/RDFLib/rdflib/issues/1179.
"""
return DataFrame(
data=([None if x is None else x.toPython() for x in row] for row in results),
columns=[str(x) for x in results.vars],
)
if __name__ == '__main__':
store = sparqlstore.SPARQLUpdateStore(query_endpoint=os.environ['SPARQL_ENDPOINT_QUERY'], update_endpoint=os.environ['SPARQL_ENDPOINT_UPDATE']) #,
# auth=HTTPDigestAuth(config.AUTH_USER, config.AUTH_PASS), context_aware=True,
g = Graph(store=store, identifier=os.environ['SPARQL_DEFAULT_NAMED_GRAPH_FULL_URI']) # namespace_manager=None
q_sa ="""
select * where {
?s ?p ?o .
} limit 20
"""
q_sa2 = """
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT *
WHERE {
?sub rdfs:label ?p .
} LIMIT 20
"""
qr = g.query(q_sa2)
df = sparql_results_to_df(qr)
logging.info(df)
I'd have expected the opposite, the Query 1 failing while raising "Undefined Prefix Error" and Query 2 retrieving my results. Is there a way to have such a behavior by modifying something client side or server side ? Is this a bad idea ? (I prefer to have everything in the queries, even the most basic namespaces)
I'd be glad to read your thought on that. Thanks in advance for your answers !
Thanks @UninformedUser, you've put me on the right track ! Hard to figure where the error fired (rdflib's graph ? sparqlstore ? endpoint config ?)
Alas, empty initNs
doesn't work as in the source it is overriden with the default graph namespace : initNs = initNs or dict(self.namespaces()) # noqa: N806
Looking at Namespace bindings from RDFLIB docs, each graph is shipped with default namespaces.
Then, solution is to override default graph config : g = Graph(store=store, identifier=os.environ['SPARQL_DEFAULT_NAMED_GRAPH_FULL_URI'], bind_namespaces="none")
Solved ! (will mark it in 2 days)