python-3.xamazon-web-servicestext-extractionamazon-textract

AWS Textract document analysis over a defined list of pages


I want to use document analysis with Python 3.9, with specific FeatureTypes for certain pages of a PDF. Is it possible to send the entire PDF file and request analysis for only certain pages, such as analyzing only the SIGNATURES feature on pages 3 and 5 of a 7-page PDF?


Solution

  • Yes, but first you need to split the pages you want out of the PDF and then concatenate those pages back together into one PDF consisting of only pages 3, 5, and 7.

    Here is a link with some python answers to split a PDF: split a multi-page pdf file into multiple pdf files with python?

    Here is a link with python answers to merge PDF files together: Merge PDF files