I have some basics of programming, but I am completely new to RDF or Sparql, so I hope to be clear in what follows. I am trying to download some data available at http://data.camera.it/data/en/datasets/, and all the data are organized in rdf-xml format, in an ontology.
I noticed this website has a SPARQL Query Editor online (http://dati.camera.it/sparql), and using some of their examples I was able to retrieve and convert some of the data I need using Python. I used the following code and query, using SparqlWrapper
from SPARQLWrapper import SPARQLWrapper, JSON
sparql = SPARQLWrapper("http://dati.camera.it/sparql")
sparql.setQuery(
'''
SELECT distinct ?deputatoId ?cognome ?nome ?data ?argomento titoloSeduta ?testo
WHERE {
?dibattito a ocd:dibattito; ocd:rif_leg <http://dati.camera.it/ocd/legislatura.rdf/repubblica_17>.
?dibattito ocd:rif_discussione ?discussione.
?discussione ocd:rif_seduta ?seduta.
?seduta dc:date ?data; dc:title ?titoloSeduta.
?seduta ocd:rif_assemblea ?assemblea.
?discussione rdfs:label ?argomento.
?discussione ocd:rif_intervento ?intervento.
?intervento ocd:rif_deputato ?deputatoId; dc:relation ?testo.
?deputatoId foaf:firstName ?nome; foaf:surname ?cognome .
}
ORDER BY ?data ?cognome ?nome
LIMIT 100
'''
)
sparql.setReturnFormat(JSON)
results_raw = sparql.query().convert()
However, I have a problem because the website allows only to download 10,000 values. As far as I understood, this limit cannot be modified. Therefore I decided to download the datasets on my computer. I tried to work on all these rdf files, but I don't know how to do it, since, as far as I know, the SparqlWrapper does not work with local files.
So my questions are:
Any suggestion on how to tackle the problem is appreciated. Thank you!
Download all the RDF/XML files from their download area, and load them into a local instance of Virtuoso (which happens to be the engine they are using for their public SPARQL endpoint). You will have the advantage of running a much more recent version (v7.2.5.1 or later), whether Open Source or Enterprise Edition than the one they've got (Open Source v7.1.0, from March, 2014!).
Use your new local SPARQL endpoint, found at http://localhost:8890/sparql by default. You can configure it to have no limits on result set sizes, or query runtimes, or otherwise.
Seems likely.
(P.S. You might encourage the folks at dati.camera.it (assistenza-dati@camera.it) to upgrade their Virtuoso instance. There are substantial performance and feature enhancements awaiting!)