pythonweb-scrapingbeautifulsoup

Python Web-Scraping using Beautiful Soup on a messy Site


I want to scrape the following three data points from this site: %verified, the numerical value for FAR, and the numerical value for POD. I'm trying to do this in BeautifulSoup, but I'm not practiced in site traversing, so I can't describe the location of those elements.

What is the easiest way to go about doing this?


Solution

  • I ended up solving it myself-- I was utilizing a strategy similar to isedev, but I was hoping I could find a better way of getting the 'Verified' Data:

    import urllib2
    from bs4 import BeautifulSoup
    
    wfo = list()
    
    def main():
        wfo = [i.strip() for i in open('C:\Python27\wfo.txt') if i[:-1]]
        soup = BeautifulSoup(urllib2.urlopen('http://mesonet.agron.iastate.edu/cow/?syear=2009&smonth=9&sday=12&shour=12&eyear=2012&emonth=9&eday=12&ehour=12&wfo=ABQ&wtype%5B%5D=TO&hail=1.00&lsrbuffer=15&ltype%5B%5D=T&wind=58').read())
        elements = soup.find_all("span")
        find_verify = soup.find_all('th')
    
        far= float(elements[1].text)
        pod= float(elements[2].text)
        verified = (find_verify[13].text[:-1])