The documentation of asyncio.create_task()
states the following warning:
Important: Save a reference to the result of this function, to avoid a task disappearing mid execution. (source)
My question is: Is this really true?
I have several IO bound "fire and forget" tasks which I want to run concurrently using asyncio
by submitting them to the event loop using asyncio.create_task()
. However, I do not really care for the return value of the coroutine or even if they run successfully, only that they do run eventually. One use case is writing data from an "expensive" calculation back to a Redis data base. If Redis is available, great. If not, oh well, no harm. This is why I do not want/need to await
those tasks.
Here a generic example:
import asyncio
async def fire_and_forget_coro():
"""Some random coroutine waiting for IO to complete."""
print('in fire_and_forget_coro()')
await asyncio.sleep(1.0)
print('fire_and_forget_coro() done')
async def async_main():
"""Main entry point of asyncio application."""
print('in async_main()')
n = 3
for _ in range(n):
# create_task() does not block, returns immediately.
# Note: We do NOT save a reference to the submitted task here!
asyncio.create_task(fire_and_forget_coro(), name='fire_and_forget_coro')
print('awaiting sleep in async_main()')
await asyncio.sleep(2.0) # <-- note this line
print('sleeping done in async_main()')
print('async_main() done.')
# all references of tasks we *might* have go out of scope when returning from this coroutine!
return
if __name__ == '__main__':
asyncio.run(async_main())
Output:
in async_main()
awaiting sleep in async_main()
in fire_and_forget_coro()
in fire_and_forget_coro()
in fire_and_forget_coro()
fire_and_forget_coro() done
fire_and_forget_coro() done
fire_and_forget_coro() done
sleeping done in async_main()
async_main() done.
When commenting out the await asyncio.sleep()
line, we never see fire_and_forget_coro()
finish. This is to be expected: When the event loop started with asyncio.run()
closes, tasks will not be executed anymore. But it appears that as long as the event loop is still running, all tasks will be taken care of, even when I never explicitly created references to them. This seem logical to me, as the event loop itself must have a reference to all scheduled tasks in order to run them. And we can even get them all using asyncio.all_tasks()
!
So, I think I can trust Python to have at least one strong reference to every scheduled tasks as long as the event loop it was submitted to is still running, and thus I do not have to manage references myself. But I would like a second opinion here. Am I right or are there pitfalls I have not yet recognized?
If I am right, why the explicit warning in the documentation? It is a usual Python thing that stuff is garbage-collected if you do not keep a reference to it. Are there situations where one does not have a running event loop but still some task objects to reference? Maybe when creating an event loop manually (never did this)?
There is an open issue at the cpython bug tracker at github about this topic I just found: https://github.com/python/cpython/issues/88831
Quote:
asyncio will only keep weak references to alive tasks (in
_all_tasks
). If a user does not keep a reference to a task and the task is not currently executing or sleeping, the user may get "Task was destroyed but it is pending!".
So the answer to my question is, unfortunately, yes. One has to keep around a reference to the scheduled task.
However, the github issue also describes a relatively simple workaround: Keep all running tasks in a set()
and add a callback to the task which removes itself from the set()
again.
running_tasks = set()
# [...]
task = asyncio.create_task(some_background_function())
running_tasks.add(task)
task.add_done_callback(lambda t: running_tasks.remove(t))