sparqlwikipediawikidataopenrefinegrel

How to reconcile in OpenRefine by Wikipedia article title?


I want to reconcile a large number of records, of which I have the exact Wikipedia article titles (including parenthetical disambiguation). What is the best/fastest way to match this large number of records based on their exact Wikipedia title in OpenRefine? If I simply reconcile by text, the confidence is low and Wikidata entries with the same title get mixed up.


Solution

  • Transform your values into Wikipedia URLs, for instance with the following GREL formula (assuming all articles are on the English Wikipedia):

    'https://en.wikipedia.org/wiki/'+value
    

    You can then reconcile this column with the Wikidata reconciliation service, which will recognize these URLs and resolve the Wikidata items via site links.

    If your article titles contain disambiguation pages, reconciliation will give you disambiguation items, so it is a good practice to double-check their type (P31) by fetching it after reconciliation.