pythonopencvocrimage-rotationskew

Detect if an OCR text image is upside down


I have some hundreds of images (scanned documents), most of them are skewed. I wanted to de-skew them using Python.
Here is the code I used:

import numpy as np
import cv2

from skimage.transform import radon


filename = 'path_to_filename'
# Load file, converting to grayscale
img = cv2.imread(filename)
I = cv2.cvtColor(img, COLOR_BGR2GRAY)
h, w = I.shape
# If the resolution is high, resize the image to reduce processing time.
if (w > 640):
    I = cv2.resize(I, (640, int((h / w) * 640)))
I = I - np.mean(I)  # Demean; make the brightness extend above and below zero
# Do the radon transform
sinogram = radon(I)
# Find the RMS value of each row and find "busiest" rotation,
# where the transform is lined up perfectly with the alternating dark
# text and white lines
r = np.array([np.sqrt(np.mean(np.abs(line) ** 2)) for line in sinogram.transpose()])
rotation = np.argmax(r)
print('Rotation: {:.2f} degrees'.format(90 - rotation))

# Rotate and save with the original resolution
M = cv2.getRotationMatrix2D((w/2,h/2),90 - rotation,1)
dst = cv2.warpAffine(img,M,(w,h))
cv2.imwrite('rotated.jpg', dst)

This code works well with most of the documents, except with some angles: (180 and 0) and (90 and 270) are often detected as the same angle (i.e it does not make difference between (180 and 0) and (90 and 270)). So I get a lot of upside-down documents.

Here is an example:
enter image description here

The resulted image that I get is the same as the input image.

Is there any suggestion to detect if an image is upside down using Opencv and Python?
PS: I tried to check the orientation using EXIF data, but it didn't lead to any solution.


EDIT:
It is possible to detect the orientation using Tesseract (pytesseract for Python), but it is only possible when the image contains a lot of characters.
For anyone who may need this:

import cv2
import pytesseract


print(pytesseract.image_to_osd(cv2.imread(file_name)))

If the document contains enough characters, it is possible for Tesseract to detect the orientation. However, when the image has few lines, the orientation angle suggested by Tesseract is usually wrong. So this can not be a 100% solution.


Solution

  • Python3/OpenCV4 script to align scanned documents.

    Rotate the document and sum the rows. When the document has 0 and 180 degrees of rotation, there will be a lot of black pixels in the image:

    rotate to find maximum zebra

    Use a score keeping method. Score each image for it's likeness to a zebra pattern. The image with the best score has the correct rotation. The image you linked to was off by 0.5 degrees. I omitted some functions for readability, the full code can be found here.

    # Rotate the image around in a circle
    angle = 0
    while angle <= 360:
        # Rotate the source image
        img = rotate(src, angle)    
        # Crop the center 1/3rd of the image (roi is filled with text)
        h,w = img.shape
        buffer = min(h, w) - int(min(h,w)/1.15)
        roi = img[int(h/2-buffer):int(h/2+buffer), int(w/2-buffer):int(w/2+buffer)]
        # Create background to draw transform on
        bg = np.zeros((buffer*2, buffer*2), np.uint8)
        # Compute the sums of the rows
        row_sums = sum_rows(roi)
        # High score --> Zebra stripes
        score = np.count_nonzero(row_sums)
        scores.append(score)
        # Image has best rotation
        if score <= min(scores):
            # Save the rotatied image
            print('found optimal rotation')
            best_rotation = img.copy()
        k = display_data(roi, row_sums, buffer)
        if k == 27: break
        # Increment angle and try again
        angle += .75
    cv2.destroyAllWindows()
    

    best rotation

    How to tell if the document is upside down? Fill in the area from the top of the document to the first non-black pixel in the image. Measure the area in yellow. The image that has the smallest area will be the one that is right-side-up:

    rightside upupside down

    # Find the area from the top of page to top of image
    _, bg = area_to_top_of_text(best_rotation.copy())
    right_side_up = sum(sum(bg))
    # Flip image and try again
    best_rotation_flipped = rotate(best_rotation, 180)
    _, bg = area_to_top_of_text(best_rotation_flipped.copy())
    upside_down = sum(sum(bg))
    # Check which area is larger
    if right_side_up < upside_down: aligned_image = best_rotation
    else: aligned_image = best_rotation_flipped
    # Save aligned image
    cv2.imwrite('/home/stephen/Desktop/best_rotation.png', 255-aligned_image)
    cv2.destroyAllWindows()