pythonmediawikiwikipediamediawiki-apipywikibot

How to get wikipedia out-links of an article in python?


I want to get the out-links of wikipedia articles. What I mean by out-linkes are the links in What links here section in wikipedia articles.

For instance, consider the data mining wikipedia article. What links here section of this article is in: https://en.wikipedia.org/wiki/Special:WhatLinksHere/Data_mining

I tried to used pywikibot as follows.

import pywikibot as pw

site = pw.Site('en', 'wikipedia')
print([
    cat.title()
    for cat in pw.Page(site, 'data mining').categories()
    if 'hidden' not in cat.categoryinfo
])

However, it seems like the categories in pywikibot is different to out-links of wikipedia articles. Therefore, I am wondering how to do this in python.

Note: I am not limited to pywikibot and happy to explore other libraries such as mediawiki.

I am happy to provide more details if needed.


Solution

  • Try Page.embeddedin() and Page.backlinks() methods. You could also directly use the equivalent modules of MediaWiki's API: