amazon-web-servicesamazon-sagemaker

AWS SageMaker endpoint infinite looping


Having an Issue where an invoke_endpoint call causes SageMaker endpoint to run In an infinite loop (see logs)

If I'm keeping my request "live" (SDK/CLI) It's causes the model to just repeat the request until keyboard interaction (Ctrl+C)

CloudWatch logs:

<CONTAINER_LOGS>
INFO:     169.254.178.2:52386 - "POST /invocations HTTP/1.1" 200 OK
INFO:     169.254.178.2:42776 - "GET /ping HTTP/1.1" 200 OK
...
<CONTAINER_LOGS>
INFO:     169.254.178.2:52386 - "POST /invocations HTTP/1.1" 200 OK
INFO:     169.254.178.2:42776 - "GET /ping HTTP/1.1" 200 OK
...

example invocation

aws sagemaker-runtime invoke-endpoint --endpoint-name <name> --body '{"x":"x"}' --content-type application/json output.json --region <region>
^C -> loop stops after canceling

What am I missing here ?

Saw a similar question in SO without answers [3]

[1] - https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html

[2] - https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints-test-endpoints.html

[3] - AWS Sagemaker inference looping


Solution

  • Issue was default timeout for real-time Inference endpoint being 60 seconds.
    Seems like missing the timeout threshold caused the request to repeat for some reason (docs)

    Switching to async inference endpoint solved it as request takes ~2m