sparqlwikidatawikidata-query-service

How do you count the entities of a type that share values in Wikidata or subparts of Wikidata?


I want to see how many entities in a selection share a property value with other entities. For example, how many paintings share the same 'depict' property value (P180)? My attempts with the following SPARQL query on WDQS (or on a local small subpart of Wikidata) often result in a timeout.

prefix wdt: <http://www.wikidata.org/prop/direct/>
prefix wd: <http://www.wikidata.org/entity/>
SELECT (count(distinct ?entity1) as ?c)
WHERE {
  ?entity1 wdt:P31 <http://www.wikidata.org/entity/Q3305213>; wdt:P180 ?val  .
  ?entity2 wdt:P31 <http://www.wikidata.org/entity/Q3305213>; wdt:P180 ?val .
  filter(?entity1!=?entity2)
}

Is there a better way of formulating the SPARQL query to get a result?


Solution

  • Since you don't use the variable ?entity2 in your select statement, a more efficient query would be the following one:

    SELECT (count(distinct ?entity1) as ?c)
    WHERE {
      ?entity1 wdt:P31 wd:Q3305213;
               wdt:P180 ?val .
      filter exists {
        ?entity2 wdt:P31 wd:Q3305213;
                 wdt:P180 ?val .
        filter(?entity1 != ?entity2)
      }
    }
    

    Unfortunately, this seems to run out of time too.

    Alternatively, you can use the following query:

    SELECT (count(distinct ?entity) as ?countEntity)
    WHERE {
      {
        SELECT ?val (count(distinct ?entity) as ?countVal)
        WHERE {
          ?entity wdt:P31 wd:Q3305213 ;
                  wdt:P180 ?val .
          hint:SubQuery hint:runOnce true .
        }
        GROUP BY ?val
        HAVING (?countVal > 1)
      }
      hint:Prior hint:runFirst true .
      ?entity wdt:P31 wd:Q3305213 ;
              wdt:P180 ?val .
    }
    

    which it runs in about 35 seconds.

    Intuitively, the inner query retrieves all the values ?val shared by at least two paintings (by checking that ?countVal > 1); then, the outer query counts the entities having at least one of such values.