pythonweb-scrapingbeautifulsoupsyntax-error

AttributeError: 'NoneType' object has no attribute 'string


I have a list of URL's where I'm scraping title name of each page by looping the entire list of URLs

The problem is whenever the url is invalid in the list the code is breaking up. so I'm trying to use try and except to pass the error how ever try and except is not working

Below is the code i'm using,(Please correct if I'm missing something here)

    import requests
    from bs4 import BeautifulSoup as BS
    url_list = ['http://www.aurecongroup.com',
    'http://www.bendigoadelaide.com.au',
    'http://www.burrell.com.au',
    'http://www.dsdbi.vic.gov.au',
    'http://www.energyaustralia.com.au',
    'http://www.executiveboard.com',
    'http://www.mallesons.com',
    'https://www.minterellison.com',
    'http://www.mta.org.nz',
    'http://www.services.nsw.gov.au']

for link in url_list:
    try:
        r = requests.get(link)    
        r.encoding = 'utf-8'
        html_content = r.text
        soup = BS(html_content, 'lxml')
        df = soup.title.string
        print(df)

    except IOError:
        pass

Executing the above code is giving me AttributeError: 'NoneType' object has no attribute 'string'. Can someone help me with this?


Solution

  • How about this:

    import requests
    from bs4 import BeautifulSoup
    
    url_list = [
        'http://www.aurecongroup.com',
        'http://www.bendigoadelaide.com.au',
        'http://www.burrell.com.au',
        'http://www.dsdbi.vic.gov.au',
        'http://www.energyaustralia.com.au',
        'http://www.executiveboard.com',
        'http://www.mallesons.com',
        'https://www.minterellison.com',
        'http://www.mta.org.nz',
        'http://www.services.nsw.gov.au'
        ]
    
    for link in url_list:   
        try:
            res = requests.get(link)    
            soup = BeautifulSoup(res.text, 'lxml')
            try:
                df = soup.title.string.strip()
            except (AttributeError, KeyError):
                df = ""
    
            print(df)
        except IOError:
            pass
    

    Partial output including none:

    Aurecon – A global engineering and infrastructure advisory company
                                             ####It gives the none value
    Stockbroking & Superannuation Brisbane | Burrell
    Home | Economic Development
    Electricity Providers - Gas Suppliers | EnergyAustralia