sparqldbpedia

Retrieve dbpedia subject categories with SPARQL


Is there a way to retrieve all categories from dcterms:subject inside dbpedia?

As an example, in http://dbpedia.org/page/Eiffel_Tower I can see in dcterms:subject the following categories:

I wish to retrieve all category:xxx values in dbpedia. Is there a way?


Solution

  • If you go a do a COUNT query to see how many categories are in dbpedia using the following SPARQL query:

    SELECT COUNT(DISTINCT ?category) AS ?count WHERE {?subject dcterms:subject ?category}
    

    you'll get that dbpedia has 503788 categories. If you query for all the categories the endpoint will not give you the whole 503788 categories since it has a cap on how many results you can get back. But you can issue multiple queries by using LIMIT and OFFSET. For example to get the first 1000 categories you can do the following query:

    SELECT DISTINCT ?category WHERE {?subject dcterms:subject ?category} LIMIT 1000 OFFSET 0
    

    I don't know how are you going to use this information but my recommendation would be to run multiple queries with incrementing the offset (e.g. 1000, 2000, 3000) and cache the results in whatever storage you are using. You can basically write a program that does executes the queries and places the results in the cache.

    Do remember however that the categories in DBPedia are hierarchical, so one category is a borader category from several others.