c++tesseractleptonica

read tiff image tesseract and leptonica


I want to read tiff file. And I save txt each .png files which is in tiff file. If I use below code, I cannot save each page with its name. How can I do ? (Cpp code)

// Open input image with leptonica library
Pix *image = pixRead("/usr/src/tesseract-3.02/phototest.tif");
api->SetImage(image);
// Get OCR result
char *outText;
outText = api->GetUTF8Text();

Solution

  • According to the Leptonica API there is a special function pixReadTiff which reads a certain page from your tif file as Pix.

    PIX *pixReadTiff(const char  *filename, l_int32 n)
    

    It returns NULL or an error if the page does not exists. Just iterate through all pages.

    To get the number of pages, you can use this function:

     l_int32 tiffGetCount(FILE *fp, l_int32  *pn)
    

    For other details you might want to look into the API yourself. You might look into this: http://tpgit.github.io/Leptonica/tiffio_8c_source.html