pythonweb-scrapingbeautifulsouptabular

BeautifulSoup: Get the contents of a specific table


My local airport disgracefully blocks users without IE, and looks awful. I want to write a Python scripts that would get the contents of the Arrival and Departures pages every few minutes, and show them in a more readable manner.

My tools of choice are mechanize for cheating the site to believe I use IE, and BeautifulSoup for parsing page to get the flights data table.

Quite honestly, I got lost in the BeautifulSoup documentation, and can't understand how to get the table (whose title I know) from the entire document, and how to get a list of rows from that table.

Any ideas?


Solution

  • This is not the specific code you need, just a demo of how to work with BeautifulSoup. It finds the table who's id is "Table1" and gets all of its tr elements.

    html = urllib2.urlopen(url).read()
    bs = BeautifulSoup(html)
    table = bs.find(lambda tag: tag.name=='table' and tag.has_attr('id') and tag['id']=="Table1") 
    rows = table.findAll(lambda tag: tag.name=='tr')