pythonocrpython-tesseract

Does anyone know the meaning of the output of image_to_data and image_to_osd methods of pytesseract?


I'm trying to extract the data from an image using pytesseract. This module has image_to_data and image_to_osd methods. These two methods provide lots of info (TextLineOrder, WritingDirection, ScriptDetection, Orientation, etc...) as output.

The image below is the output of the image_to_data method. What do the values of these columns (level, block_num, par_num, line_num, word_num) mean?

enter image description here

The output of image_to_osd looks as presented below. What is the meaning each term in it?

Page number: 0
Orientation in degrees: 0
Rotate: 0
Orientation confidence: 16.47
Script: Latin
Script confidence: 4.00

I referred to docs but I did not find any info regarding these parameters.


Solution

  • Column Level:

    1. Item with no block_num, paragraph_num, line_num, word_num
    2. Item with block_num and with no paragraph_num, line_num, word_num
    3. Item with block_num, paragraph_num and with no line_num, word_num
    4. Item with block_num, paragraph_num, line_num, and with no word_num
    5. Item with all those numbers

    Column block_num: Block number of the detected text or item
    Column par_num: Paragraph number of the detected text or item
    Column line_num: Line number of the detected text or item
    Column word_num: word number of the detected text or item

    But above all 4 columns are interconnected.If the item comes from new line then word number will start counting again from 0, it doesn't continue from previous line last word number. Same goes with line_num, par_num, block_num.

    Check out the below image for reference.
    1st column: block_num
    2nd column: par_num
    3rd column: line_num
    4rth column: word_num
    enter image description here