[SOLVED] How to extract different sections of a pdf with Document Ai

How to extract different sections of a pdf with Document Ai

I want to be able to show a list of different sections of the pdf file like what is shown on the. I'm calling the processor through REST api via Flutter Web.

I tried getiing the entities from the api response using fieldMask but got nothing for the document in the picture, not sure what fields should be used to get the desired response.

Solution

The Document OCR Processor returns text and layout information in the Document JSON format. Each of those sections highlighted in the UI is a Block or a Paragraph, you will need to parse the JSON response to get the data for each section including the bounding boxes.

You can refer to Handle the processing response > Text, layout, and quality scores in the documentation for explanations of how the output is structured and code samples for parsing it.

You can also refer to these open source sample web applications that show use cases similar to what you are asking:

https://github.com/GoogleCloudPlatform/document-ai-samples/tree/main/web-app-demo
https://github.com/GoogleCloudPlatform/document-ai-samples/tree/main/web-app-pix2info-python