python google-cloud-vertex-ai cloud-document-ai gcp-ai-platform-training

Document AI - Converting the normalized_vertices to the orginal scale of the document

I am using Google Cloud - Document AI service. I have custom built some processors for "form data extraction" using the "Custom Entity Extractor" which processes PDF documents. I annotated the dataset and I completed training my model. Now i am able to access the processor using the Python SDK to send input requests and am able to fetch responses.

While parsing the response, under the section: result.document.entities[0].page_anchor.page_refs[0].bounding_poly.normalized_vertices where i get normalized co-ordinate values, that is on a scale from 0-1, which represents the location of the Entity/Value on a given page on PDF.

A sample example of the values are as below:

[x: 0.30874478816986084
y: 0.34131988883018494
x: 0.47531232237815857
y: 0.34131988883018494
x: 0.47531232237815857
y: 0.36359813809394836
x: 0.30874478816986084
y: 0.36359813809394836]

Under the Page dimensions object: result.document.pages[0] object i get the pixel scale values of the page. Example object response looks like:

dimension {
  width: 1681.0
  height: 2379.0
  unit: "pixels"
}

My Expecations:

Now my expectation is to fetch the positions of the entities, by scaling up the normalized co-ordinates. and crop that part of the PDF page, which is converted as Image using pdf2image module.

I am using cv2 module for image processing here.

Solution

The Document AI Toolbox SDK for Python has functionality to export images from an Entity bounding box. Currently, it's set to only export detected images (such as a profile photo from a drivers license) but the same code should work to export an image of an entity with text.

https://github.com/googleapis/python-documentai-toolbox/blob/c1843812d988b4a9877b66176be8d103b55b112a/google/cloud/documentai_toolbox/wrappers/entity.py#LL66C5-L90C64

Something like this should work for you

from io import BytesIO
from PIL import Image

page_ref = documentai_entity.page_anchor.page_refs[0]
doc_page = documentai_document.pages[page_ref.page]
image_content = doc_page.image.content

doc_image = Image.open(BytesIO(image_content))
w, h = doc_image.size
vertices = [
  (int(v.x * w + 0.5), int(v.y * h + 0.5)) for v in page_ref.bounding_poly.normalized_vertices
]
(top, left), (bottom, right) = vertices[0], vertices[2]
entity_image = doc_image.crop((top, left, bottom, right))