I want to create a multiprocess comprehension in Python 3.7.
Here's the code I have:
async def _url_exists(url):
"""Check whether a url is reachable"""
request = requests.get(url)
return request.status_code == 200:
async def _remove_unexisting_urls(rows):
return {row for row in rows if await _url_exists(row[0])}
rows = [
'http://example.com/',
'http://example.org/',
'http://foo.org/',
]
rows = asyncio.run(_remove_unexisting_urls(rows))
In this code example, I want to remove non-existing URLs from a list. (Note that I'm using a set instead of a list because I also want to remove duplicates).
My issue is that I still see that the execution is sequential. HTTP Requests make the execution wait. When compared to a serial execution, the execution time is the same.
asyncio
itself doesn't run different async
functions concurrently. However, with the multiprocessing
module's Pool.map
, you can schedule functions to run in another process:
from multiprocessing.pool import Pool
pool = Pool()
def fetch(url):
request = requests.get(url)
return request.status_code == 200
rows = [
'http://example.com/',
'http://example.org/',
'http://foo.org/',
]
rows = [r for r in pool.map(fetch, rows) if r]