amazon-web-servicesamazon-textract

Lambda function is not detecting tables correctly


I'm trying to complete lab in Machine Learning path in AWS Cloud Quest, but get an error "Your Lambda function is not detecting tables correctly"

I tried a few other ways, but it's not working, it seems I need only uncomment few lines of code and change a resource from FIELDS to TABLES, but it don't pass the test, I'm stuck.

What I'm doing wrong? Maybe someone of you guys completed this lab?

Here is my code

import json
import logging
import boto3

from trp import Document
from urllib.parse import unquote_plus

logger = logging.getLogger()
logger.setLevel(logging.INFO)

s3 = boto3.client('s3')

output_key = "output/textract_response.json"


def lambda_handler(event, context):

    logger.info(event)
    for record in event['Records']:
        bucket = record['s3']['bucket']['name']
        key = unquote_plus(record['s3']['object']['key'])
        textract = boto3.client('textract')

        try:
            response = textract.analyze_document(  
                Document={                         
                    'S3Object': {
                        'Bucket': bucket,
                        'Name': key
                    }
                },
                FeatureTypes=['TABLES',  # FeatureTypes is a list of the types of analysis to perform.
                              ])                            
                                                            
                                                            
                                                            

            doc = Document(response)  

            for page in doc.pages:
                print("Fields:")
                for field in page.form.fields:
                    print("Key: {}, Value: {}".format(field.key, field.value))

                    print("\nSearch Fields:")
                    key = "address"
                    fields = page.form.searchFieldsByKey(key)
                    for field in fields:
                        print("Key: {}, Value: {}".format(field.key, field.value))

            for page in doc.pages:
                print("\nTable details:")
                for table in page.tables:
                    for r, row in enumerate(table.rows):
                        for c, cell in enumerate(row.cells):
                            print("Table[{}][{}] = {}".format(r, c, cell.text))

            return_result = {"Status": "Success"}

            # Finally the response file will be written in the S3 bucket output folder.
            s3.put_object(
                Bucket=bucket,
                Key=output_key,
                Body=json.dumps(response, indent=4)
            )

            return return_result
        except Exception as error:
            return {"Status": "Failed", "Reason": json.dumps(error, default=str)}



Solution

  • I work for AWS (sales, not technical!) and have had this same issue. I raised a support ticket to check that the validation service was working correctly. I was having the same issue with validation failing, despite the cloudwatch logs showing table data - confirming the lambda code was indeed correct. Our Support team noticed there was an issue with the assignment validation for the "Extract Text from Docs" lab which has now been fixed. I just did the lab again and the validation worked, so the lab is now complete. You may want to try again now. Hope this helps. :)