pythonweb-scrapingbeautifulsoup

How can I scrape bank of america for business hours?


Hi I was wondering how I can use beautifulsoup to scrape bank of america for its hours. For example, if the url is (Shattuck_Ave_94704_BERKELEY_CA/bank_branch_locations/">http://locators.bankofamerica.com/locator/locator/2129_Shattuck_Ave_94704_BERKELEY_CA/bank_branch_locations/) how can i extract hours only? Below is my initial attempt at it, but it seems to return nothing.

page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
hours = soup.find_all("div", class_="lobbyHours")
print hours

Solution

  • That url redirects, which is why soup.find_all("div", class_="lobbyHours") returns nothing. There is no div with that class on the page you're redirecting to.

    By monitoring network traffic with Firefox's Firebug, I found that the url you are requesting actually returns a 301 Moved Permanently status code. Fortunately, even a 301 status code, in the response headers provides a Location header. In this case:

    'http://locators.bankofamerica.com/locator/locator/LocatorAction.do?shouldTest=true'
    

    Which is the branch-locator page. You will have to start at this page, programmatically 'search' for the location(s) you would like, find the appropriate link, and perform a third request.

    The site also uses cookies, so look into cookielib.