mediawikimediawiki-api

Wiki search API page title search


I am trying to search for pages that contain specific title and Coord template. However, I am getting strange behavior.

For example, searching for Taj Mahal, the following query returns decent result.

https://en.wikipedia.org/w/api.php?
  action=query&
  format=json&
  list=search&
  formatversion=2&
  srlimit=10&
  sroffset=0&
  srsearch=hastemplate:Coord intitle:Taj&
  srprop=
{
    "batchcomplete": true,
    "continue": {
        "sroffset": 10,
        "continue": "-||"
    },
    "query": {
        "searchinfo": {
            "totalhits": 60,
            "suggestion": "tai",
            "suggestionsnippet": "tai"
        },
        "search": [
            {
                "ns": 0,
                "title": "Taj Mahal",
                "pageid": 82976
            },
            {
                "ns": 0,
                "title": "Taj Mahal Palace Hotel",
                "pageid": 1983963
            },
            {
                "ns": 0,
                "title": "Taj Mahal Bangladesh",
                "pageid": 20579666
            },
            {
                "ns": 0,
                "title": "Black Taj Mahal",
                "pageid": 39649287
            },
            {
                "ns": 0,
                "title": "Taj Connemara",
                "pageid": 33725911
            },
            {
                "ns": 0,
                "title": "Taj-ul-Masajid",
                "pageid": 10014406
            },
            {
                "ns": 0,
                "title": "Taj-e Dowlatshah",
                "pageid": 38190785
            },
            {
                "ns": 0,
                "title": "Taj ol Din Kola",
                "pageid": 40983978
            },
            {
                "ns": 0,
                "title": "Taj Kuh, Zirkuh",
                "pageid": 35936872
            },
            {
                "ns": 0,
                "title": "Taj-ol Dowleh-ye Muziraj",
                "pageid": 40918093
            }
        ]
    }
}

But, the following query returns no results while I was expecting articles with Taj Ma in title:

https://en.wikipedia.org/w/api.php?
  action=query&
  format=json&
  list=search&
  formatversion=2&
  srlimit=10&
  sroffset=0&
  srsearch=hastemplate:Coord intitle:Taj Ma&
  srprop=
{
    "batchcomplete": true,
    "query": {
        "searchinfo": {
            "totalhits": 0,
            "suggestion": "tai",
            "suggestionsnippet": "tai"
        },
        "search": []
    }
}

Even strange, following query returns Rowgir-e Taj Amiri but omits everything else including Taj Mahal

https://en.wikipedia.org/w/api.php?
  action=query&
  format=json&
  list=search&
  formatversion=2&
  srlimit=10&
  sroffset=0&
  srsearch=hastemplate:Coord intitle:Taj Mah&
  srprop=
{
    "batchcomplete": true,
    "query": {
        "searchinfo": {
            "totalhits": 1,
            "suggestion": "tai",
            "suggestionsnippet": "tai"
        },
        "search": [
            {
                "ns": 0,
                "title": "Rowgir-e Taj Amiri",
                "pageid": 41773498
            }
        ]
    }
}

Similarly, querying for Taj Maha returns Taj Coromandel but omits everything else including Taj Mahal

** I have found similar behavior when trying this pattern of searching for many other places, London for example.

Unless I am not understanding properly how this API works, I was expecting Taj Mahal to be present in every response of search requests I made.


Solution

  • intitle:Taj Ma means "every page whose title contains "Taj" and title or text contains "Ma"".

    A rule of thumb is to always quote the search term: intitle:"Taj Ma". However, this also doesn't return anything, presumably due to MediaWiki considering Taj Ma not a subphrase of Taj Mahal.

    Using a regex as the search term bypasses that quirk: intitle:/Taj Ma/.