aws-lambdaamazon-textract

Timeout when calling Textract from Lambda function


I want to use a Lambda function to call Textract on an image in an S3 bucket. When I run the following in a SageMaker notebook, it works perfectly and takes only a couple of seconds.

import boto3
textract_client = boto3.client('textract')
textract_response = textract_client.detect_document_text(
    Document={'S3Object': {'Bucket': 'my-bucket-name', 'Name': 'my_image_file.png'}}
)

But when I run this in Lambda, the function times out. (I have the timeout set to the maximum of 15 minutes.)

I wondered whether it was a permissions issue, but the role for the Lambda function has both AmazonS3FullAccess and AmazonTextractFullAccess. (In general, this Lambda function can access S3 files just fine.)

Both the SageMaker notebook and the Lambda function are in the same region (us-east-2).


Solution

  • Most likely your lambda is under a VPC. You must add a VPC Endpoint for lambda: VPC -> Endpoints -> Create endpoint

    Search for com.amazonaws.us-east-1.textract / amazon / Interface in AWS Services. Add your VPC, Subnets ( don't forget to select subnet id ) and Security Group.