pythonsparqlrdfontologysparqlwrapper

SPARQL query on multiple RDF files


I have some basics of programming, but I am completely new to RDF or Sparql, so I hope to be clear in what follows. I am trying to download some data available at http://data.camera.it/data/en/datasets/, and all the data are organized in rdf-xml format, in an ontology.

I noticed this website has a SPARQL Query Editor online (http://dati.camera.it/sparql), and using some of their examples I was able to retrieve and convert some of the data I need using Python. I used the following code and query, using SparqlWrapper

from SPARQLWrapper import SPARQLWrapper, JSON
sparql = SPARQLWrapper("http://dati.camera.it/sparql")
sparql.setQuery(
    '''
    SELECT distinct ?deputatoId ?cognome ?nome ?data ?argomento titoloSeduta ?testo 
    WHERE {
    ?dibattito a ocd:dibattito; ocd:rif_leg <http://dati.camera.it/ocd/legislatura.rdf/repubblica_17>.

    ?dibattito ocd:rif_discussione ?discussione.
    ?discussione ocd:rif_seduta ?seduta.
    ?seduta dc:date ?data; dc:title ?titoloSeduta.
    ?seduta ocd:rif_assemblea ?assemblea.

    ?discussione rdfs:label ?argomento.

    ?discussione ocd:rif_intervento ?intervento.
    ?intervento ocd:rif_deputato ?deputatoId; dc:relation ?testo. 
    ?deputatoId foaf:firstName ?nome; foaf:surname ?cognome .
    }

    ORDER BY ?data ?cognome ?nome
    LIMIT 100
    '''
)
sparql.setReturnFormat(JSON)
results_raw = sparql.query().convert()

However, I have a problem because the website allows only to download 10,000 values. As far as I understood, this limit cannot be modified. Therefore I decided to download the datasets on my computer. I tried to work on all these rdf files, but I don't know how to do it, since, as far as I know, the SparqlWrapper does not work with local files.

So my questions are:

  1. How do I create a dataset containing all the RDF files so that I can work on them as if it were a single object?
  2. How do I query on such an object to retrieve the information I need? Is that possible?
  3. Is this way of reasoning the right approach?

Any suggestion on how to tackle the problem is appreciated. Thank you!


Solution

    1. Download all the RDF/XML files from their download area, and load them into a local instance of Virtuoso (which happens to be the engine they are using for their public SPARQL endpoint). You will have the advantage of running a much more recent version (v7.2.5.1 or later), whether Open Source or Enterprise Edition than the one they've got (Open Source v7.1.0, from March, 2014!).

    2. Use your new local SPARQL endpoint, found at http://localhost:8890/sparql by default. You can configure it to have no limits on result set sizes, or query runtimes, or otherwise.

    3. Seems likely.

    (P.S. You might encourage the folks at dati.camera.it (assistenza-dati@camera.it) to upgrade their Virtuoso instance. There are substantial performance and feature enhancements awaiting!)