I'd like to add character recognition functionality to my application that's why asking you what's the best available and affordable OCR SDK . I looked at ABBY FineReader Engine 10.0 but haven't got trial version yet as I requested from the official site!
I've downloaded Asprise OCR SDK but it's doesn't recognize Cyrillic symbols..
How to implement character recognition on my application ? By using what kind of libs, SDKs, APIs and so on..
There's Cunieform and Google's Tesseract OCR, both of which are free. Personally I've used Tesseract, the SDK was giving a lot of trouble so finally decided to simply call the command line interface of Tesseract with arguments from within my C program using the system()
function.
Lots of people face difficulties with the Tesseract installation, so here's a short summary (version 2 works for me, insert appropriate version if necessary):
Download the following from the svn: tesseract-2.00.tar.gz
, tesseract-2.00.exe6.tar.gz
, tesseract-2.00.eng.tar.gz
Unzip tesseract-2.00.tar.gz
to a folder
Unzip tesseract-2.00.exe6.tar.gz
and move to where tesseract-2.00.tar.gz
was unzipped. A few files will be replaced this way
Similarly unzip tesseract-2.00.eng.tar.gz
and move to tesseract-2.00.tar.gz
where tessdata folder will be replaced.
After all this is done, open the tesseract.dsw workspace, select All Files and do "Rebuild All." This'll take a while with loads of warnings but hopefully no errors.
The command using DOS shell is tesseract picture.tif textfile -l eng
. So basically save your image as a TIFF file, run the command from within your program and then read in the OCR output strings from the text file.