pythonmediawiki-apirevision-history

MediaWiki API revisions VS allrevisions


I am trying to write a script in order to get the revision history of biographies (the goal is to investigate how a biography changes over time). I have read most of the related articles here and the documentation about the revision module but I can't get the results I want. I post my code, most of it is copied (partially or complete) from the documentation. I changed the value in the titles parameter.

Moreover, I found the allrevisions submodule. I made it to return revisions for a specific biography, but what I get doesn't related to the revision history that someone found on the page.

Code related to "revisions"

import requests
S = requests.session()
URL = "https://www.mediawiki.org/w/api.php"

PARAMS = {
    "action": "query",
    "prop": "revisions",
    "titles": "Albert Einstein",
    "rvprop": "timestamp|user|content",
    "rvslots": "main",
    "formatversion": "2",
    "format": "json"
}
R = S.get(url=URL, params=PARAMS)
DATA = R.json()
print(DATA)

Code related to "allrevisions"

URL = "https://www.mediawiki.org/w/api.php"

    PARAMS = {
    "action": "query",
    "list": "allrevisions",
    "titles": "Albert Einstein",
    "arvprop": "user|timestamp|content",
    "arvslots": "main",
    "arvstart": "2020-11-12T12:06:00Z",
    "formatversion": "2",
    "format": "json"
}
    R = S.get(url=URL, params=PARAMS)
    DATA = R.json()
    print(DATA)

Any suggestions to make it work properly? The most important is why the code related to "revisions" doesn't return anything.

As suggested, I want to get the full revision history for a specific page.


Solution

  • prop modules return information about a specific page (or set of pages) you provide. list modules return information about a list of pages where you only provide some abstract criteria and finding the pages matching those criteria is part of the work the API is doing (as such, titles in your second example will essentially be ignored).

    You don't explain clearly what you are trying to do, but I'm guessing you want to get the full page history for a specific title, so your first example is mostly right, except you should set a higher rvlimit.

    See also the (unfortately not very good) doc on continuing queries since many pages have a history which is too long to return in a single request.