I am using Google Cloud - Document AI service. I have custom built some processors for "form data extraction" using the "Custom Entity Extractor" which processes PDF documents. I annotated the dataset and I completed training my model. Now i am able to access the processor using the Python SDK to send input requests and am able to fetch responses.
While parsing the response, under the section: result.document.entities[0].page_anchor.page_refs[0].bounding_poly.normalized_vertices
where i get normalized co-ordinate values, that is on a scale from 0-1, which represents the location of the Entity/Value on a given page on PDF.
A sample example of the values are as below:
[x: 0.30874478816986084
y: 0.34131988883018494
x: 0.47531232237815857
y: 0.34131988883018494
x: 0.47531232237815857
y: 0.36359813809394836
x: 0.30874478816986084
y: 0.36359813809394836]
Under the Page dimensions object: result.document.pages[0]
object i get the pixel scale values of the page. Example object response looks like:
dimension {
width: 1681.0
height: 2379.0
unit: "pixels"
}
My Expecations:
Now my expectation is to fetch the positions of the entities, by scaling up the normalized co-ordinates. and crop that part of the PDF page, which is converted as Image using pdf2image
module.
I am using cv2
module for image processing here.
The Document AI Toolbox SDK for Python has functionality to export images from an Entity
bounding box. Currently, it's set to only export detected images (such as a profile photo from a drivers license) but the same code should work to export an image of an entity with text.
Something like this should work for you
from io import BytesIO
from PIL import Image
page_ref = documentai_entity.page_anchor.page_refs[0]
doc_page = documentai_document.pages[page_ref.page]
image_content = doc_page.image.content
doc_image = Image.open(BytesIO(image_content))
w, h = doc_image.size
vertices = [
(int(v.x * w + 0.5), int(v.y * h + 0.5)) for v in page_ref.bounding_poly.normalized_vertices
]
(top, left), (bottom, right) = vertices[0], vertices[2]
entity_image = doc_image.crop((top, left, bottom, right))