pythonerror-handlingscraperwiki

What is the pythonic way to catch errors and keep going in this loop?


I've got two functions that work just fine, but seem to break down when I run them nested together.

def scrape_all_pages(alphabet):
    pages = get_all_urls(alphabet)
    for page in pages:
        scrape_table(page)

I'm trying to systematically scrape some search results. So get_all_pages() creates a list of URLs for each letter in the alphabet. Sometimes there are thousands of pages, but that works just fine. Then, for each page, scrape_table scrapes just the table I'm interested in. That also works fine. I can run the whole thing and it works fine, but I'm working in Scraperwiki and if I set it to run and walk away it invariably gives me a "list index out of range" error. This is definitely an issue within scraperwiki, but I'd like to find a way to zero in on the problem by adding some try/except clauses and logging errors when I encounter them. Something like:

def scrape_all_pages(alphabet):
    try:
        pages = get_all_urls(alphabet)
    except:
        ## LOG THE ERROR IF THAT FAILS.
    try:
        for page in pages:
            scrape_table(page)
    except:
        ## LOG THE ERROR IF THAT FAILS

I haven't been able to figure out how to generically log errors, though. Also, the above looks clunky and in my experience when something looks clunky, Python has a better way. Is there a better way?


Solution

  • It is a good way, but. You should not use just except clause, you have to specify the type of the exception you are trying to catch. Also you can catch an error and continue the loop.

    def scrape_all_pages(alphabet):
        try:
            pages = get_all_urls(alphabet)
        except IndexError: #IndexError is an example
            ## LOG THE ERROR IF THAT FAILS.
    
        for page in pages:
            try:
                scrape_table(page)
            except IndexError: # IndexError is an example
                ## LOG THE ERROR IF THAT FAILS and continue this loop