sparqlrdfgeojsonlinked-dataopendata

Downloading GeoJSON boundaries using SPARQL from publicly available data


I'm interested in downloading some boundary files from statistics.gov.scot, which is an official statistical repository for sharing statistical data that utilises SPARQL queries.

Background

Statistics.gov.scot provides access to GeoJSON boundaries for number of administrative and statistical geographies, like local authority administrative boundaries or health boards. In my particular case I'm interested in download a data set with GeoJSON boundaries pertaining to data zones. Data zones are statistical geographies developed for the purpose of disseminating life outcomes data on a small area level. When accessed via the statistics.gov.scot sample data zone looks like that:

Sample data zone

The geography and the related data can be accessed here. The corresponding GeoJSON data is available here.

Problem

Data zones are available in two iterations, on produced in 2004 and another one updated recently. I would like to download first iteration produced in 2004. Following the information on the statistical entities, I drafted the following query:

PREFIX entity: <http://statistics.data.gov.uk/def/statistical-entity#>
PREFIX boundaries: <http://statistics.gov.scot/boundaries/>

SELECT ?boundary 
    WHERE {
        entity:introduced <http://reference.data.gov.uk/id/day/2004-02-01>
  }

LIMIT 1000

which returns the following error message:

Error There was a syntax error in your query: Encountered " "}" "} "" at line 7,
column 3. Was expecting one of: <IRIref> ... <PNAME_NS> ... <PNAME_LN> ...
<BLANK_NODE_LABEL> ... <VAR1> ... <VAR2> ... "true" ... "false" ... <INTEGER> ...
<DECIMAL> ... <DOUBLE> ... <INTEGER_POSITIVE> ... <DECIMAL_POSITIVE> ...
<DOUBLE_POSITIVE> ... <INTEGER_NEGATIVE> ... <DECIMAL_NEGATIVE> ...
<DOUBLE_NEGATIVE> ... <STRING_LITERAL1> ... <STRING_LITERAL2> ...
<STRING_LITERAL_LONG1> ... <STRING_LITERAL_LONG2> ... "(" ... <NIL> ... "[" ...
<ANON> ... "+" ... "*" ... "/" ... "|" ... "?" ...

when tested via the endpoint: http://statistics.gov.scot/sparql.

Comments

Ideally, I would like to develop other queries that would enable me to source other statistical geographies by using the entity: prefix. This should be possible as the entity: will contain information on the available geographies (name, acronym, date of creation).


The query:

PREFIX entity: <http://statistics.data.gov.uk/def/statistical-entity#>
PREFIX boundaries: <http://statistics.gov.scot/boundaries/>

SELECT DISTINCT ?boundary ?shape WHERE {
  ?shape entity:firstcode ?boundary
}

LIMIT 1000

Got me to something that looks like a list of desired geographies but I'm struggling to source the GeoJSON boundaries.


Solution

  • Neither statistics.gov.scot nor statistics.data.gov.uk contains data zones boundaries as WKT or string literals.

    However, with the following query, one could easily construct URLs of the GeoJSON files that are used on resources' pages:

    PREFIX pref1: <http://statistics.data.gov.uk/def/statistical-entity#>
    PREFIX pref2: <http://statistics.gov.scot/id/statistical-entity/>
    PREFIX pref3: <http://statistics.data.gov.uk/def/boundary-change/>
    PREFIX pref4: <http://reference.data.gov.uk/id/day/>
    PREFIX pref5: <http://statistics.data.gov.uk/def/statistical-geography#>
    PREFIX pref6: <http://statistics.gov.scot/id/statistical-geography/>
    PREFIX pref7: <http://statistics.gov.scot/boundaries/>
    
    SELECT ?zone ?name ?json {
       ?zone pref1:code pref2:S01 .
       ?zone pref3:operativedate pref4:2004-02-01
       OPTIONAL { ?zone pref5:officialname ?name }
       BIND (CONCAT(REPLACE(STR(?zone), STR(pref6:), STR(pref7:)), ".json") AS ?json)
    } ORDER BY (!bound(?name)) ASC(?name)
    

    After that, one could easily retrieve GeoJSON files using wget -i or something like this.

    Some explanation

    You should use <http://statistics.data.gov.uk/def/boundary-change/operativedate> instead of <http://statistics.data.gov.uk/def/statistical-entity#introduced>, the latter property is rather a class property:

    SELECT * WHERE {
        ?S <http://statistics.data.gov.uk/def/statistical-entity#introduced> ?date .
        ?S <http://www.w3.org/2000/01/rdf-schema#label> ?label
    }
    

    The second generation data zones are dated by 2014-11-06:

    SELECT ?date (COUNT(?zone) AS ?count) WHERE {
        ?zone
            <http://statistics.data.gov.uk/def/statistical-entity#code>
                <http://statistics.gov.scot/id/statistical-entity/S01> ;
            <http://statistics.data.gov.uk/def/boundary-change/operativedate>
                ?date 
    } GROUP BY ?date
    

    Analogously, if you need URLs of corresponding GeoJSON files, your query should be:

    SELECT ?zone ?name ?json {
       ?zone pref1:code pref2:S01 .
       ?zone pref3:operativedate pref4:2014-11-06 .
       ?zone pref5:officialname ?name 
       BIND (CONCAT(REPLACE(STR(?zone), STR(pref6:), STR(pref7:)), ".json") AS ?json)
    } ORDER BY ASC(?name)
    

    You do not need OPTIONAL, because all second generation data zones have "official names".


    Probably this page on data.gov.uk will be interesting for you.
    There also exists opendata.stackexchange.com for questions related to open data.

    Update

    As of May 2018, one can retrieve data zones boundaries as WKT:

    PREFIX pref1: <http://statistics.data.gov.uk/def/statistical-entity#>
    PREFIX pref2: <http://statistics.gov.scot/id/statistical-entity/>
    PREFIX pref3: <http://statistics.data.gov.uk/def/boundary-change/>
    PREFIX pref4: <http://reference.data.gov.uk/id/day/>
    PREFIX pref5: <http://statistics.data.gov.uk/def/statistical-geography#>
    PREFIX pref6: <http://www.opengis.net/ont/geosparql#>
    
    
    SELECT ?zone ?name ?geometry {
       ?zone pref1:code pref2:S01 .
       ?zone pref3:operativedate pref4:2014-11-06 .
       ?zone pref5:officialname ?name .
       ?zone pref6:hasGeometry/pref6:asWKT ?geometry .
    } ORDER BY ASC(?name)