python-3.xweb-scrapingasync-awaitpython-requests-html

How do I run multiple GET requests with coroutine properly?


I'm learning requests-html and would like to know how to run multiple tasks asynchronously.

While attempting to perform an asynchronous task using requests-html, I encountered an error message stating 'coroutine' object is not callable, and that coroutine 'async_get_url' was never awaited:

Traceback (most recent call last):
  File "/c/Users/olube/Desktop/v1-logic/options-tactical/eng-cmp--data_sec_ops/.desktop-instance/4-projects-workflow/.web_scrapping/.my_sdk/./main.py", line 21, in <module>
    result = a_hs.run(*[async_get_url(url) for url in urls])
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/c/Users/olube/Desktop/v1-logic/options-tactical/eng-cmp--data_sec_ops/.desktop-instance/.py_env/lib64/python3.11/site-packages/requests_html.py", line 771, in run
    tasks = [
            ^
  File "/c/Users/olube/Desktop/v1-logic/options-tactical/eng-cmp--data_sec_ops/.desktop-instance/.py_env/lib64/python3.11/site-packages/requests_html.py", line 772, in <listcomp>
    asyncio.ensure_future(coro()) for coro in coros
                          ^^^^^^
TypeError: 'coroutine' object is not callable
sys:1: RuntimeWarning: coroutine 'async_get_url' was never awaited

Here is my code:

#!/usr/bin/env python

from requests_html import HTMLSession as hs, AsyncHTMLSession as a_hs
from pprint import pprint
from typing import Any as any

def get_url(url: str) -> any:
  s = hs()
  r = s.get(url)
  
  return r

async def async_get_url(url: str) -> any:
  s = a_hs()
  r = await s.get(url)
  
  return r

if __name__ == "__main__":
  urls = ('https://python.org/', 'https://reddit.com/', 'https://google.com/')
  result = a_hs.run(*[async_get_url(url) for url in urls])

  pprint(result)


Solution

  • If I understand you correctly then you're looking for something like this:

    from requests_html import AsyncHTMLSession
    from pprint import pprint
    from typing import Any as any
    
    async def get_url(url: str) -> any:
        print(url)
        return await session.get(url)
    
    session = AsyncHTMLSession()
    
    urls = ["https://python.org/", "https://reddit.com/", "https://google.com/"]
    coroutines = [lambda url=url: get_url(url) for url in urls]
    
    result = session.run(*coroutines)
    pprint(result)
    

    The list comprehension creates a list of coroutines, where the lambda is important to ensure that these are not immediately evaluated. The use argument unpacking to pass this list to the run() method.

    Output:

    https://python.org/
    https://reddit.com/
    https://google.com/
    [<Response [200]>, <Response [200]>, <Response [200]>]