I'm having a problem figuring out what code I need to create to make to make python try the next url in my csv file each url is on a line like this:
http://www.indexedamerica.com/states/PR/Adjuntas/Restaurants-Adjuntas-00601.html http://www.indexedamerica.com/states/PR/Aguada/Restaurants-Aguada-00602.html http://www.indexedamerica.com/states/PR/Aguadilla/Restaurants-Aguadilla-00603.html http://www.indexedamerica.com/states/PR/Aguadilla/Restaurants-Aguadilla-00604.html http://www.indexedamerica.com/states/PR/Aguadilla/Restaurants-Aguadilla-00605.html http://www.indexedamerica.com/states/PR/Maricao/Restaurants-Maricao-00606.html http://www.indexedamerica.com/states/MI/Kent/Restaurants-Grand-Rapids-49503.html
#open csv file
#read csv file line by line
#Pass each line to beautiful soup to try
#If URL raises a 404 error continue to next line
#extract tables from url
from mechanize import Browser
from BeautifulSoup import BeautifulSoup
import csv
mech = Browser()
indexed = open('C://python27/longlist.csv')
reader = csv.reader(indexed)
html = mech.open(reader)
for line in html:
try:
mechanize.open(html)
table = soup.find("table", border=3)
else:
#!!!! try next url from file. How do I do this?
for row in table.findAll('tr')[2:]:
col = row.findAll('td')
BusinessName = col[0].string
Phone = col[1].string
Address = col[2].string
City = col[3].string
State = col[4].string
Zip = col[5].string
Restaurantinfo = (BusinessName, Phone, Address, City, State)
print "|".join(Restaurantinfo)
for line in html:
try:
mechanize.open(html)
table = soup.find("table", border=3)
except Exception:
continue
Alternatively, you could check the status code of the page, and skip if you receive a 404 (in a for loop):
if urllib.urlopen(url).getcode() == '404':
continue
continue
in a loop, stops execution of further code and continues to the next entry in the loop.