pythonmultithreadingthreadpoolpython-multithreadingconcurrent.futures

How to re-execute function in ThreadPoolExecutor in case of error?


I'm trying this example code for python's ThreadPoolExecutor from python's concurrent.futures documentation. Specifically

import concurrent.futures
import urllib.request

URLS = ['http://www.foxnews.com/',
        'http://www.cnn.com/',
        'http://europe.wsj.com/',
        'http://www.bbc.co.uk/',
        'http://some-made-up-domain.com/']

# Retrieve a single page and report the URL and contents
def load_url(url, timeout):
    with urllib.request.urlopen(url, timeout=timeout) as conn:
        return conn.read()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))

How can I re-send a call to the executor if an exception is generated inside the concurrent.futures.as_completed(future_to_url) loop? Basically I simply want to retry failed jobs.

I've tried simply calling something like executor.submit(...) but it doesn't seem to work.

Any suggestion is welcome, thanks


Solution

  • The Futures are realized after the failure. The failure is their value. Therefore, you cannot re-execute them, you must create a new Future.

    Since you have the URL, you could do it on the error handler and, after the Futures in future_to_url completes, iterate over the new Futures which were added to a different collection.