classificationocrdocumenthandwriting-recognitionvision-api

Document classification handwritten or computer printed


I have many documents some are handwritten and some are computer printed (scan images/pdfs). I wanted to separate them into two groups Computer printed and Handwritten. Could you anyone please guide me through the approach to do this. I am using Google Vision API for data extraction however I wanted to extract data from handwritten document only.

Adding more details, I am calling google vision api through RPA tool UiPath, I am restricted to use google vision api only for data/text extraction from images. I am not looking for machine learning solutions like Auto ML or Custom Machine learning project. I am looking for approach where I can built small program to identify the document is computer printed or handwritten. Program will take image or pdf as input and output whether it is computer printed or handwritten image or pdf.

Any help would be appreciated.


Solution

  • You can check out opencv's template match. Because handwritten words almost never same and computer printed words same every time you can get a letter template and check the template result points. If it's contains your template with high confidence it is computer printed.