pythonwikidatapywikibot

Matching specific Geonames IDs with Wikidata IDs using Pywikibot


I have an extensive list of Geonames IDs for which I want to find the matching Wikidata IDs. I would like to use Pywikibot and, if possible, iterate over the list.

The SPARQL query for an individual Geonames ID would be:

SELECT DISTINCT ?item ?itemLabel WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "de". }
  {
    SELECT DISTINCT ?item WHERE {
      ?item p:P1566 ?statement0.
      ?statement0 (ps:P1566) "2867714".
    }
  }
}

2867714 is the Geonames ID for Munich, and running the query via the following script returns the correct Wikidata ID:

import pywikibot
from pywikibot import pagegenerators as pg

# read query file

with open('C:\\Users\\p70076654\\Downloads\\SPARQL_mapGeonamesID.rq', 'r') as query_file:
    QUERY = query_file.read()
    #print(QUERY)
    
# create generator based on query
# returns an iterator that produces a sequence of values when iterated over
# useful when creating large sequences of values

wikidata_site = pywikibot.Site("wikidata", "wikidata")
generator = pg.WikidataSPARQLPageGenerator(QUERY, site=wikidata_site)

print(generator)

# OUTPUT: <generator object WikidataSPARQLPageGenerator.<locals>.<genexpr> at 0x00000169FAF3FD10>

# iterate over generator

for item in generator:
    print(item)

The correct output returned is: wikidata:Q32664319

Ideally, I want to replace the specific ID for a variable to add IDs from my list successively. I checked the Pywikibot documentation but could not find information on my specific use case. How can I ingest replace the individual ID for a variable and iterate over my ID list?


Solution

  • First, why do you use a subquery? You can simplify its syntax as:

    SELECT DISTINCT ?item ?itemLabel WHERE {
      SERVICE wikibase:label { bd:serviceParam wikibase:language "de". }
      ?item p:P1566/ps:P1566 "2867714".
    }
    

    Coming to your question, you can use python's string interpolation for generalizing your query as:

    SELECT DISTINCT ?item ?itemLabel WHERE {
      SERVICE wikibase:label { bd:serviceParam wikibase:language "de". }
      ?item p:P1566/ps:P1566 "%s".
    }
    

    and then instantiate it as QUERY % "2867714".

    With a list of ids, it would be something like:

    with open('C:\\Users\\p70076654\\Downloads\\SPARQL_mapGeonamesID.rq', 'r') as query_file:
        QUERY = query_file.read()
    
    geonames_ids = ["2867714", "2867715", "2867716"]
    for geonames_id in geonames_ids :
        wikidata_site = pywikibot.Site("wikidata", "wikidata")
        generator = pg.WikidataSPARQLPageGenerator(QUERY % geonames_id, site=wikidata_site)
        ...