I have a list of URL's where I'm scraping title name of each page by looping the entire list of URLs
The problem is whenever the url is invalid in the list the code is breaking up. so I'm trying to use try and except to pass the error how ever try and except is not working
Below is the code i'm using,(Please correct if I'm missing something here)
import requests
from bs4 import BeautifulSoup as BS
url_list = ['http://www.aurecongroup.com',
'http://www.bendigoadelaide.com.au',
'http://www.burrell.com.au',
'http://www.dsdbi.vic.gov.au',
'http://www.energyaustralia.com.au',
'http://www.executiveboard.com',
'http://www.mallesons.com',
'https://www.minterellison.com',
'http://www.mta.org.nz',
'http://www.services.nsw.gov.au']
for link in url_list:
try:
r = requests.get(link)
r.encoding = 'utf-8'
html_content = r.text
soup = BS(html_content, 'lxml')
df = soup.title.string
print(df)
except IOError:
pass
Executing the above code is giving me AttributeError: 'NoneType' object has no attribute 'string'
.
Can someone help me with this?
How about this:
import requests
from bs4 import BeautifulSoup
url_list = [
'http://www.aurecongroup.com',
'http://www.bendigoadelaide.com.au',
'http://www.burrell.com.au',
'http://www.dsdbi.vic.gov.au',
'http://www.energyaustralia.com.au',
'http://www.executiveboard.com',
'http://www.mallesons.com',
'https://www.minterellison.com',
'http://www.mta.org.nz',
'http://www.services.nsw.gov.au'
]
for link in url_list:
try:
res = requests.get(link)
soup = BeautifulSoup(res.text, 'lxml')
try:
df = soup.title.string.strip()
except (AttributeError, KeyError):
df = ""
print(df)
except IOError:
pass
Partial output including none:
Aurecon – A global engineering and infrastructure advisory company
####It gives the none value
Stockbroking & Superannuation Brisbane | Burrell
Home | Economic Development
Electricity Providers - Gas Suppliers | EnergyAustralia