I have an extensive list of Geonames IDs for which I want to find the matching Wikidata IDs. I would like to use Pywikibot and, if possible, iterate over the list.
The SPARQL query for an individual Geonames ID would be:
SELECT DISTINCT ?item ?itemLabel WHERE {
SERVICE wikibase:label { bd:serviceParam wikibase:language "de". }
{
SELECT DISTINCT ?item WHERE {
?item p:P1566 ?statement0.
?statement0 (ps:P1566) "2867714".
}
}
}
2867714 is the Geonames ID for Munich, and running the query via the following script returns the correct Wikidata ID:
import pywikibot
from pywikibot import pagegenerators as pg
# read query file
with open('C:\\Users\\p70076654\\Downloads\\SPARQL_mapGeonamesID.rq', 'r') as query_file:
QUERY = query_file.read()
#print(QUERY)
# create generator based on query
# returns an iterator that produces a sequence of values when iterated over
# useful when creating large sequences of values
wikidata_site = pywikibot.Site("wikidata", "wikidata")
generator = pg.WikidataSPARQLPageGenerator(QUERY, site=wikidata_site)
print(generator)
# OUTPUT: <generator object WikidataSPARQLPageGenerator.<locals>.<genexpr> at 0x00000169FAF3FD10>
# iterate over generator
for item in generator:
print(item)
The correct output returned is: wikidata:Q32664319
Ideally, I want to replace the specific ID for a variable to add IDs from my list successively. I checked the Pywikibot documentation but could not find information on my specific use case. How can I ingest replace the individual ID for a variable and iterate over my ID list?
First, why do you use a subquery? You can simplify its syntax as:
SELECT DISTINCT ?item ?itemLabel WHERE {
SERVICE wikibase:label { bd:serviceParam wikibase:language "de". }
?item p:P1566/ps:P1566 "2867714".
}
Coming to your question, you can use python's string interpolation for generalizing your query as:
SELECT DISTINCT ?item ?itemLabel WHERE {
SERVICE wikibase:label { bd:serviceParam wikibase:language "de". }
?item p:P1566/ps:P1566 "%s".
}
and then instantiate it as QUERY % "2867714"
.
With a list of ids, it would be something like:
with open('C:\\Users\\p70076654\\Downloads\\SPARQL_mapGeonamesID.rq', 'r') as query_file:
QUERY = query_file.read()
geonames_ids = ["2867714", "2867715", "2867716"]
for geonames_id in geonames_ids :
wikidata_site = pywikibot.Site("wikidata", "wikidata")
generator = pg.WikidataSPARQLPageGenerator(QUERY % geonames_id, site=wikidata_site)
...