I'm developing a Flask Python backend hosted on AWS Lambda (using Zappa to automatically deploy to Lambda and API Gateway). One of my endpoints takes around 30-60s to execute, which causes my API Gateway to sometimes time out due to CloudFront limitations.
I am thinking about reimplementing this endpoint in an asynchronous fashion like so. However, I am concerned that invoking separate AWS Lambda function to run my long-running task (after queuing from the initial Lambda function) is not the right way to go about this. My rationale is that Lambda functions are for stateless, very quick processes, so using it to execute a long-running endpoint task seems wrong. Is there a better way to do this (particularly within the Flask/Zappa/AWS system that I'm using right now)?
Thanks for the help!
Lambda is the only service afaik that can:
If the long-running task can complete within the Lambda 15 minute max timeout, then use Lambda. Note that with increased RAM, Lambda functions get more allocated CPU and network bandwidth, so they can accomplish more in 15 minutes. In this scenario, note that API Gateway has an integration timeout of 29 seconds for Lambda function, so the Lambda function initially invoked by API Gateway would have to asynchronously invoke a 2nd Lambda function to perform the long-running task.
If the task takes longer than 15 minutes, can it be split into multiple sub-tasks each of which completes in under 15 minutes? If so, then look at Step Functions with multiple concurrent or serial Lambda functions.
Otherwise, it sounds like you may need a different compute solution, for example ECS or EC2. Both EC2 and ECS-based persistent solutions will have some ongoing cost, however.
Another option for tasks that don't fit in Lambda's 15 minute timeout may be to consider creating some orchestration that can run your task on a newly-launched EC2 instance, which auto-terminates when the task is complete, thus you are only paying for EC2 when needed. This option is a little more complex because you have to orchestrate it, e.g. by triggering a Lambda function to launch an EC2 instance, pass the task specification to that EC2 instance (e.g. in launch-time userdata script or via SMS Run Command), monitor for success, collect the output, auto-terminate the instance when complete, and handle retries/failures.