
Amazon Textract - How to define my key-value pairs

I have tried textract and I can see that it extracts few interesting key-value pairs.

I have an image dataset each annotated with a set of domain-specific key-value pairs which are different of what textract found.

Is there anyway to make textract looking for my key-value pairs? Kind of transfer learning, or specific configuration of the tool?


  • No. There is no way to change how textract predicts text or identifies relationships between them. You can keep adding your images and forms and textract will (in theory) train itself on them, but I doubt it will help much. You can try to get the raw text that is detected and come up with your own script to put them in relationships. Note that textract will return the raw text detected in order that it finds them on the image/pdf. So it is fairly easy to come up with your own logic to map them however you want.