pythonparallel-processinglist-comprehensionpython-asyncioset-comprehension

How to use parallelization in set/list comprehension using asyncio?


I want to create a multiprocess comprehension in Python 3.7.

Here's the code I have:

async def _url_exists(url):
  """Check whether a url is reachable"""
  request = requests.get(url)
  return request.status_code == 200:

async def _remove_unexisting_urls(rows):
  return {row for row in rows if await _url_exists(row[0])}

rows = [
  'http://example.com/',
  'http://example.org/',
  'http://foo.org/',
]
rows = asyncio.run(_remove_unexisting_urls(rows))

In this code example, I want to remove non-existing URLs from a list. (Note that I'm using a set instead of a list because I also want to remove duplicates).

My issue is that I still see that the execution is sequential. HTTP Requests make the execution wait. When compared to a serial execution, the execution time is the same.


Solution

  • asyncio itself doesn't run different async functions concurrently. However, with the multiprocessing module's Pool.map, you can schedule functions to run in another process:

    from multiprocessing.pool import Pool
    
    pool = Pool()
    
    def fetch(url):
        request = requests.get(url)
        return request.status_code == 200
    
    rows = [
      'http://example.com/',
      'http://example.org/',
      'http://foo.org/',
    ]
    rows = [r for r in pool.map(fetch, rows) if r]