I want to see how many entities in a selection share a property value with other entities. For example, how many paintings share the same 'depict' property value (P180)? My attempts with the following SPARQL query on WDQS (or on a local small subpart of Wikidata) often result in a timeout.
prefix wdt: <http://www.wikidata.org/prop/direct/>
prefix wd: <http://www.wikidata.org/entity/>
SELECT (count(distinct ?entity1) as ?c)
WHERE {
?entity1 wdt:P31 <http://www.wikidata.org/entity/Q3305213>; wdt:P180 ?val .
?entity2 wdt:P31 <http://www.wikidata.org/entity/Q3305213>; wdt:P180 ?val .
filter(?entity1!=?entity2)
}
Is there a better way of formulating the SPARQL query to get a result?
Since you don't use the variable ?entity2
in your select statement, a more efficient query would be the following one:
SELECT (count(distinct ?entity1) as ?c)
WHERE {
?entity1 wdt:P31 wd:Q3305213;
wdt:P180 ?val .
filter exists {
?entity2 wdt:P31 wd:Q3305213;
wdt:P180 ?val .
filter(?entity1 != ?entity2)
}
}
Unfortunately, this seems to run out of time too.
Alternatively, you can use the following query:
SELECT (count(distinct ?entity) as ?countEntity)
WHERE {
{
SELECT ?val (count(distinct ?entity) as ?countVal)
WHERE {
?entity wdt:P31 wd:Q3305213 ;
wdt:P180 ?val .
hint:SubQuery hint:runOnce true .
}
GROUP BY ?val
HAVING (?countVal > 1)
}
hint:Prior hint:runFirst true .
?entity wdt:P31 wd:Q3305213 ;
wdt:P180 ?val .
}
which it runs in about 35 seconds.
Intuitively, the inner query retrieves all the values ?val
shared by at least two paintings (by checking that ?countVal > 1
); then, the outer query counts the entities having at least one of such values.