amazon-web-servicesamazon-textract

How can I get the full list of classified outputs for amazon textract (cal_text_lending) other than 'PAYSLIPS' & 'CHECKS'? Is there a way?


I am trying to figure out what other types I might get as a result other than these...when I use from textractcaller import call_textract_lending

import boto3
from textractcaller import call_textract_lending
import sagemaker
import os

document = 'lending_package.pdf'

# variables
data_bucket = sagemaker.Session().default_bucket()
region = boto3.session.Session().region_name
account_id = boto3.client('sts').get_caller_identity().get('Account')

os.environ["BUCKET"] = data_bucket
os.environ["REGION"] = region
role = sagemaker.get_execution_role()

print(f"SageMaker role is: {role}\nDefault SageMaker Bucket: s3://{data_bucket}")

s3=boto3.client('s3')
textract = boto3.client('textract', region_name=region)

# Upload images to S3 bucket:
bucket = sagemaker.Session().default_bucket()
print(f"SageMaker Bucket: s3://{data_bucket}")

!aws s3 cp docs/lending_package.pdf s3://{data_bucket}/idp/textract/ --only-show-errors

input_file = 's3://' + bucket + '/idp/textract' + '/' + document
print(f"Lending Package uploaded to S3: {input_file}")

# Process document
textract_json = call_textract_lending(input_document=input_file, boto3_textract_client=textract)

# Print results
results = textract_json['Results']
    
for page in results:
    print("Page Number: {}".format(page["Page"]), "Page Classification: {}".format(page["PageClassification"]["PageType"]))

Here is the code I used and the results I got

I have checked the code for call_textract_lending https://github.com/aws-samples/amazon-textract-textractor/blob/master/caller/textractcaller/t_call.py#L19 and the aws documentation https://docs.aws.amazon.com/textract/latest/dg/API_Prediction.html and found that there is no list of possible options. Is there a way to find this?


Solution

  • Okay, so, I found that the list of document types is available in https://docs.aws.amazon.com/textract/latest/dg/lending-response-objects.html under Document Types and you can download it as well