wikipediawikipedia-apiwikidatamediawiki-apiwikibase

The Wikipedia iwlinks table only stores some links to Wikidata pages. Where are the others?


I'm using the Wikipedia dumps extracts to process Wikipedia instead of the Wikipedia API because I'd like to run a lot of queries quickly.

I'd like to connect Wikipedia pages to their respective Wikidata pages. My understanding is the iwlinks table contains this information. However, although I've been able to verify this for some Wikipedia pages, I've also been able to verify that it's not the case for others.

For example, if we look up Metallica's Wikipedia page in the iwlinks table, we get:

iwl_from, iwl_prefix, iwl_title
'18787', 'c', 'Special:Search/Metallica'
'18787', 'd', 'Q15920'
'18787', 'q', 'Special:Search/Metallica'

Where the row containing 'd' in the iwl_namespace column contains information about where to find the Metallica Wikidata page (i.e. Q15920).

However, if we lookup the iwlinks table for Tom Selleck's Wikipedia page using:

SELECT * FROM iwlinks WHERE iwl_from = 277451;

we get:

iwl_from, iwl_prefix, iwl_title
'277451', 'commons', 'Tom_Selleck'
'277451', 'q', 'Special:Search/Tom_Selleck'

Neither of these rows contain information about his his Wikidata page. However, his Wikipedia page contains a "Wikidata item" link to his Wikidata page, so presumably it must be stored somewhere, but I can't find it.

I'd greatly appreciate any suggestions you can think of.

P.S. Bonus points if you can point me in the right direction to figure out where the licence information is stored for each image in Wikipedia.


Solution

  • You can find the wikidata item in the page_props table. iwlinks contains the links which appear in the text (look at the bottom of the Metallica article, you'll see a little sister project box, which is just a wikitext template; that's what generated those iwlinks entries). The links on the sidebar used to come from langlinks, but Wikidata has largely replaced the system of interlanguage links so now those associations are stored on Wikidata instead.