I am trying to search for pages that contain specific title
and Coord
template.
However, I am getting strange behavior.
For example, searching for Taj Mahal
, the following query returns decent result.
https://en.wikipedia.org/w/api.php?
action=query&
format=json&
list=search&
formatversion=2&
srlimit=10&
sroffset=0&
srsearch=hastemplate:Coord intitle:Taj&
srprop=
{
"batchcomplete": true,
"continue": {
"sroffset": 10,
"continue": "-||"
},
"query": {
"searchinfo": {
"totalhits": 60,
"suggestion": "tai",
"suggestionsnippet": "tai"
},
"search": [
{
"ns": 0,
"title": "Taj Mahal",
"pageid": 82976
},
{
"ns": 0,
"title": "Taj Mahal Palace Hotel",
"pageid": 1983963
},
{
"ns": 0,
"title": "Taj Mahal Bangladesh",
"pageid": 20579666
},
{
"ns": 0,
"title": "Black Taj Mahal",
"pageid": 39649287
},
{
"ns": 0,
"title": "Taj Connemara",
"pageid": 33725911
},
{
"ns": 0,
"title": "Taj-ul-Masajid",
"pageid": 10014406
},
{
"ns": 0,
"title": "Taj-e Dowlatshah",
"pageid": 38190785
},
{
"ns": 0,
"title": "Taj ol Din Kola",
"pageid": 40983978
},
{
"ns": 0,
"title": "Taj Kuh, Zirkuh",
"pageid": 35936872
},
{
"ns": 0,
"title": "Taj-ol Dowleh-ye Muziraj",
"pageid": 40918093
}
]
}
}
But, the following query returns no results while I was expecting articles with Taj Ma
in title:
https://en.wikipedia.org/w/api.php?
action=query&
format=json&
list=search&
formatversion=2&
srlimit=10&
sroffset=0&
srsearch=hastemplate:Coord intitle:Taj Ma&
srprop=
{
"batchcomplete": true,
"query": {
"searchinfo": {
"totalhits": 0,
"suggestion": "tai",
"suggestionsnippet": "tai"
},
"search": []
}
}
Even strange, following query returns Rowgir-e Taj Amiri
but omits everything else including Taj Mahal
https://en.wikipedia.org/w/api.php?
action=query&
format=json&
list=search&
formatversion=2&
srlimit=10&
sroffset=0&
srsearch=hastemplate:Coord intitle:Taj Mah&
srprop=
{
"batchcomplete": true,
"query": {
"searchinfo": {
"totalhits": 1,
"suggestion": "tai",
"suggestionsnippet": "tai"
},
"search": [
{
"ns": 0,
"title": "Rowgir-e Taj Amiri",
"pageid": 41773498
}
]
}
}
Similarly, querying for Taj Maha
returns Taj Coromandel
but omits everything else including Taj Mahal
** I have found similar behavior when trying this pattern of searching for many other places, London
for example.
Unless I am not understanding properly how this API works, I was expecting Taj Mahal
to be present in every response of search requests I made.
intitle:Taj Ma
means "every page whose title contains "Taj" and title or text contains "Ma"".
A rule of thumb is to always quote the search term: intitle:"Taj Ma"
. However, this also doesn't return anything, presumably due to MediaWiki considering Taj Ma
not a subphrase of Taj Mahal
.
Using a regex as the search term bypasses that quirk: intitle:/Taj Ma/
.