pythonpython-asyncioaiohttp

Keep aiohttp session alive


I'm trying to visit a website every X seconds with parallel and separate sessions, then analyse what is in the response to see if each session should continue or not. However, once the code reaches the second loop it fails.

import asyncio
from aiohttp import ClientSession
import logging
import time

interval = 30
instances = 2
visit_url = 'http://www.example.org'

tasks = []

logging.basicConfig(
    format='%(asctime)s.%(msecs)03d %(message)s',  # Log in format time.milliseconds {message}
    level=logging.INFO,  # Use with logging.info()
    datefmt='%H:%M:%S')  # Display time as Hours:Minutes:Seconds


class StopException(Exception):
    pass


async def quit_app(session, task_, reason):
    logging.info("[{}] {}.".format(task_, reason))
    session.cookies.clear()  # Reset cookies
    session.headers.clear()  # Reset headers
    session.close()  # End HTTP connection
    raise StopException


async def get_status(response):
    if "abow" in response:
        return "success"
    elif "odoap" or "daoscp" in response:
        return "waiting"
    elif "nullt" in response:
        return "fail"
    elif "issue" in response:
        return "banned"
    elif "pending" in response:
        return "pending"
    else:
        return "pending"


async def initialise(headers):
    session = ClientSession()
    task_ = len(asyncio.Task.all_tasks()) - instances - 1
    passed = False
    while passed is False:
        async with session as session:
            async with session.get(visit_url, headers=headers, allow_redirects=True) as initial:
                status = await get_status(await initial.text())  # Check HTML for status
                if status == "success":
                    logging.info("[{}] {}.".format(task_, "Success"))
                    passed = True
                elif status == "pending":
                    logging.info("[{}] {}.".format(task_, "Pending.."))
                    await asyncio.sleep(interval)
                elif status == "waiting":
                    logging.info("[{}] {}.".format(task_, "Waiting..."))
                    await asyncio.sleep(interval)
                elif status == "banned":
                    await quit_app(initial, task_, "Banned")
                elif status == "fail":
                    await quit_app(initial, task_, "Failed")


if __name__ == "__main__":
    headers = {
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
        'accept-encoding': 'gzip, deflate, br',
        'accept-language': 'en-US,en;q=0.9',
        'upgrade-insecure-asks': '1',
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'
    }  # Add appropriate headers
    start = time.clock()
    loop = asyncio.get_event_loop()
    for i in range(instances):
        task = asyncio.ensure_future(initialise(headers))
        tasks.append(task)
    loop.run_until_complete(asyncio.wait(tasks))
    end = time.clock()
    print("Process took {0:.4f} seconds.".format(end - start))

This code returns the following error:

13:56:58.604 Task exception was never retrieved future: Task finished coro= initialise() done, defined at C:/Users/x/PycharmProjects/tests/src/aiohttp_main.py:49 exception=RuntimeError('Session is closed',) RuntimeError: Session is closed

I just can't work out how to make the sessions stay alive until I .close() them...


Solution

  • I just can't work out how to make the sessions stay alive until I .close() them...

    The async with is a request to close the session. Which means that after you write:

    async with session as session:
        # ... use session
    # here session is closed
    

    ...you can no longer use session once the body of async with is done executing. That's a feature, as it allows the resources associated with the session to be cleaned up in a predictable and timely fashion. And it's not specific to aiohttp either, it's how with generally works in Python. For example, when working with files, with signals that the file is to be closed at the end of the statement:

    with open('data.csv') as fileobj:
        # ... read stuff from fileobj
    
    # outside the "with" block, fileobj is closed and you
    # can no longer read from it
    

    The behavior of async with is specified in PEP 492. In case of aiohttp, async with on the ClientSession object triggers its __aexit__() method afterwards, which executes await self._close() - as seen in the source code.

    For your situation the fix is simple enough, just move the with outside the while loop. For example:

    async def initialise(headers):
        async with ClientSession() as session:
            # the rest of the code, including the `while` loop, here
    

    On an unrelated note, you probably want to replace len(asyncio.Task.all_tasks()) with a global counter of your own. Using Task.all_tasks() in this way can start producing incorrect results if you later incorporate other unrelated tasks into the event loop (or a third-party library does that for you).