pythonmediawiki-apimediawiki-extensions

Get Pages of Top Results from Search Using pymediawiki


I am trying to use the pymediawiki Python library to extract data from the MediaWiki API.

What I want to do is get the top 10 hits for a particular search term and then get the relevant page for each of these hits.

My code looks like this so far,

from mediawiki import MediaWiki

wiki = MediaWiki()

# Perform the search
search_results = wiki.search('washington', results=10)

# Retrieve the pages for the search results
pages = []
for result in search_results:
    page = wiki.page(result)
    pages.append(page)

# Print the titles of the retrieved pages
for page in pages:
    print(page.title)

However, with this approach, I often run into the DisambiguationError error.

Given below is an example for the stack trace of the error for the search term given above,

DisambiguationError: 
"Washington" may refer to: 
  All pages with titles beginning with Washington
  All pages with titles containing Washington
  Boeing Washington
  Booker T. Washington High School (disambiguation)
  Cape Washington, Greenland
  Catarman, Northern Samar
  Central Washington Wildcats
  Eastern Washington Eagles
  Escalante, Negros Occidental
  Fort Washington (disambiguation)
  Fort Washington, Pennsylvania
  George Washington
  George Washington High School (disambiguation)
  George Washington University
  George Washington, Cuba
  Harold Washington College
  Lake Washington (disambiguation)
  Lake Washington High School
  Mahaica-Berbice
  Mount Washington (disambiguation)
  New Washington, Aklan
  Port Washington (disambiguation)
  SS Washington
  SS Washington (1941)
  San Jacinto, Masbate
  Surigao City
  USS Washington
  University of Mary Washington
  University of Washington
  Washington & Jefferson College
  Washington (footballer, born 1 April 1975)
  Washington (footballer, born 10 April 1975)
  Washington (footballer, born 1953)
  Washington (footballer, born 1985)
  Washington (footballer, born 1989)
  Washington (footballer, born August 1978)
  Washington (footballer, born May 1986)
  Washington (footballer, born November 1978)
  Washington (footballer, born November 1986)
  Washington (musician)
  Washington (name)
  Washington (state)
  Washington (steamboat 1851)
  Washington (tree)
  Washington Academy (disambiguation)
  Washington Avenue (disambiguation)
  Washington Boulevard (disambiguation)
  Washington Bridge (disambiguation)
  Washington Capitals
  Washington College
  Washington College (California)
  Washington College Academy
  Washington College of Law
  Washington College, Connecticut
  Washington Commanders
  Washington County (disambiguation)
  Washington County High School (disambiguation)
  Washington Court House, Ohio
  Washington Escarpment
  Washington F.C.
  Washington Female Seminary
  Washington High School (disambiguation)
  Washington Huskies
  Washington International School
  Washington International University
  Washington Island (French Polynesia)
  Washington Island (Kiribati)
  Washington Island (disambiguation)
  Washington Land
  Washington Medical College
  Washington Mystics
  Washington Nationals
  Washington Old Hall
  Washington Park (disambiguation)
  Washington School (disambiguation)
  Washington Square (Philadelphia)
  Washington Square (disambiguation)
  Washington Square West, Philadelphia
  Washington State Cougars
  Washington Street (disambiguation)
  Washington Township (disambiguation)
  Washington University Bears
  Washington University in St. Louis
  Washington University of Barbados
  Washington Valley (disambiguation)
  Washington Wizards
  Washington district (disambiguation)
  Washington metropolitan area
  Washington station (disambiguation)
  Washington, Alabama
  Washington, Arkansas
  Washington, California
  Washington, Connecticut
  Washington, D.C.
  Washington, Georgia
  Washington, Illinois
  Washington, Indiana
  Washington, Iowa
  Washington, Kansas
  Washington, Kentucky
  Washington, Louisiana
  Washington, Maine
  Washington, Massachusetts
  Washington, Michigan
  Washington, Mississippi
  Washington, Missouri
  Washington, Nebraska
  Washington, New Hampshire
  Washington, New Jersey
  Washington, New York
  Washington, North Carolina
  Washington, Oklahoma
  Washington, Ontario
  Washington, Pennsylvania
  Washington, Rhode Island
  Washington, Tyne and Wear
  Washington, Utah
  Washington, Vermont
  Washington, Virginia
  Washington, West Sussex
  Washington, West Virginia
  Washington, Wisconsin (disambiguation)
  Washington, Yolo County, California
  Washington-on-the-Brazos, Texas
  Washingtonian (disambiguation)
  Western Washington Vikings
  federal government of the United States

Is there a way around this in order for me to achieve what I need?

I am open to other approaches as well.


Solution

  • The reason you are getting the DisambiguationError is that when you do wiki.search it includes what you searched in the first element:

    >>> print(wiki.search('washington', results=10))
    
    ['Washington', 'Washington (state)', 'George Washington', 'Washington, D.C.', 'Washington Commanders', 'The Washington Post', 'Denzel Washington', 'University of Washington', 'Washington Capitals', 'Washington Wizards']
    

    Because of this to fix the code all we need to do is slice out the first element and because we want 10 values we just add one more value to the results arg.

    Fixed code

    from mediawiki import MediaWiki
    wiki = MediaWiki()
    search_results = wiki.search('washington', results=11)[1:]
    
    pages = []
    for i in search_results:
        page = wiki.page(i)
        pages.append(page)
    
    for i in pages:
        print(i.title)