I'm trying to query businesses that have LinkedIn identifiers, ISINS, LEIs, or English websites. I only want one row of data per company if possible.
I'm struggling with the last part. I think I successfully identified that I only want English websites to populate, but the actual URLs are not populating in the result.
SELECT DISTINCT ?item ?itemLabel ?linkedin ?isin ?lei ?website WHERE {
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
{
SELECT DISTINCT ?item ?linkedin ?isin ?lei ?website WHERE {
?item wdt:P31 wd:Q4830453.
{
?item wdt:P4264 ?linkedin.}
UNION
{
?item wdt:P946 ?isin.}
UNION
{
?item wdt:P1278 ?lei.}
UNION
{
?item wdt:P856 ?statement0.
?statement0 p:P856 [ps:P856 ?website ; pq:P407 wd:Q1860] }
}
LIMIT 100
}
}
The other data is populating, but not the websites. I'm also getting multiple records per company.
For what concerns websites, note that
?item wdt:P856 ?statement0.
?statement0 p:P856 [ps:P856 ?website ; pq:P407 wd:Q1860]
is wrong. Just use ?item p:P856 [ps:P856 ?website ; pq:P407 wd:Q1860] .
. Check Wikidata:SPARQL tutorial § Qualifiers for how to query for qualifiers.
Your query can be corrected by first selecting only the ?item
having at least one of such statements (using UNION
), and then using OPTIONAL
for getting all and only the statements that are actually specified for each ?item
.
The resulting query would be something like:
SELECT DISTINCT ?item ?itemLabel ?linkedin ?isin ?lei ?website WHERE {
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
OPTIONAL { ?item wdt:P4264 ?linkedin . }
OPTIONAL { ?item wdt:P946 ?isin . }
OPTIONAL { ?item wdt:P1278 ?lei . }
OPTIONAL { ?item p:P856 [ps:P856 ?website ; pq:P407 wd:Q1860] . }
{
SELECT DISTINCT ?item WHERE {
?item wdt:P31 wd:Q4830453 .
{ ?item wdt:P4264 ?linkedin . }
UNION
{ ?item wdt:P946 ?isin . }
UNION
{ ?item wdt:P1278 ?lei . }
UNION
{ ?item p:P856 [ps:P856 ?website ; pq:P407 wd:Q1860] . }
hint:SubQuery hint:runOnce true .
}
LIMIT 100
}
hint:Prior hint:runFirst true .
}
Note that:
GROUP_CONCAT
.LIMIT 100
in your subquery you may end up with incomplete results (for example, the same company may have both P4264 and P946, but maybe only the first one occurs in the first 100 results). In order to fix this issue, you should move out the LIMIT
command.