query-optimizationsparqlgeosparql

SPARQL DBPedia query for seating capacity, optimize and remove duplicates


I want to get all objects with seating capacity information on DBPedia. Optionally, I want to get their label, address, lat and lon information.

My issue is that I get a lot of duplicates even after filtering by language. How can I get distinct entries based on, say, 'address', or any other attribute?

Also, can you tell which part of this query can be improved so that my query doesn't time out when I use the public DBpedia endpoint? Thanks!

PREFIX dbpediaO: <http://dbpedia.org/ontology/>

SELECT ?place ?label ?capacity ?address ?lat ?lon WHERE {

?place dbpedia2:seatingCapacity ?capacity .


OPTIONAL{

?place dbpediaO:address ?address . 
?place rdfs:label ?label .
?plage geo:lat ?lat .
?place geo:long ?lon .
    }

filter (lang(?label) = "en" || lang(?label) = "eng")
filter (lang(?address) = "en" || lang(?address) = "eng")

}

Solution

  • Your places have multiple values of, for example, address. The unique thing is the URI itself. Moreover, you should put each property in a separate OPTIONAL, or at least use separate OPTIONAL clauses for lat/long. For label you do not need an OPTIONAL clause at all in DBpedia. The only way to get unique places is to group by the place and sample or group_concat all other properties. Something like this:

    PREFIX dbo: <http://dbpedia.org/ontology/> 
    SELECT ?place (sample(?_label) as ?label) 
    (group_concat(?capacity; separator=";") as ?capacities) 
    (group_concat(?address; separator=";") as ?adresses) ?lat ?lon 
    WHERE { 
      ?place dbo:seatingCapacity ?capacity ; 
             rdfs:label ?_label . 
      filter (langmatches(lang(?_label),"en")) 
      OPTIONAL { 
           ?place dbo:address ?address . 
           filter (langmatches(lang(?address), "en"))
      } OPTIONAL { 
           ?place geo:lat ?lat ; geo:long ?lon . 
      } 
    } 
    group by ?place ?lat ?lon 
    order by desc(?place) 
    limit 100
    

    As you can see, there are also multiple capacity values for places.