pythonasynchronouscelerygeventgreenlets

How to make Celery I/O bound tasks execute concurrently?


The only thing my Celery task is doing is making an API request and sending the response back to Redis queue. What I'd like to achieve is to utilize as many resources as possible by executing tasks in a coroutine-like fashion. This way every time a coroutine hits requests.post() the context switcher can switch and allocate resources to another coroutine to send one more request and so forth.

As I understand, to achieve this, my worker has to run with a gevent execution pool:

celery worker --app=worker.app --pool=gevent --concurreny=500

But it doesn't solve the problem on its own. I have found that (probably) for it to work as expected we need monkey patching:

@app.task
def task_make_request(payload)
    import gevent.monkey
    gevent.monkey.patch_all()
    
    requests.post('url', payload)

The questions:

  1. Is Gevent the only execution pool that can be used for this goal?
  2. Will patch_all make requests.post() asynchronous so that the context switcher can allocate resources to other coroutines?
  3. What is the preferred way of achieving cooperative multitasking behavior for celery tasks with a single I/O bound operation (API call)?

Solution

  • When you run under the gevent worker, monkey patching happens almost immediately (see: celery.init), and does not need to be repeated. This will patch the threading and related concurrency modules. You can inspect this if you get creative fishing in the requests library dynamically at runtime (an exercise left to the reader).

    You can also use the eventlet worker, and they have a webscraping example in the Celery repository: here