I am trying to extract text from documents using ocr, but for my use case I need character-level coordinates.
I am getting following output from EasyOCR:
[
[
[
60,
88
],
[
639,
88
],
[
639,
124
],
[
60,
124
]
],
"Some phrase",
0.6820449765391986
]
Is it possible to get character-level coordinates without manually calculating them for the phrase?
This is not natively supported by easyocr
. You can, however, take the provided bounding boxes and separate the characters by scanning along the mid-line or baseline horizontally. I used cv2
to binarize the region of the bounding box and collect each connected component. I took the resulting width of the box for the individual characters' boxes and kept the height from the easyocr
result.
The example image shown here is taken from this site. The following code should get you started, you can adapt it to your needs:
import easyocr
import cv2
import numpy as np
# Run EasyOCR
reader = easyocr.Reader(['en'])
image_path = 'testocr.png'
image = cv2.imread(image_path)
results = reader.readtext(image_path)
image_copy = image.copy() # Do not destroy the original image with debug annotations
# Loop over all OCR results
for bbox, text, conf in results:
# 1. Crop phrase bounding box
pts = np.array(bbox, dtype=np.int32)
x_min, y_min = pts[:,0].min(), pts[:,1].min()
x_max, y_max = pts[:,0].max(), pts[:,1].max()
roi = image[y_min:y_max, x_min:x_max]
# 2. Binarization and connected component analysis
gray = cv2.cvtColor(roi, cv2.COLOR_BGR2GRAY)
_, binary = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(binary, connectivity=8)
# 3. Traverse horizontal centerline
center_y = roi.shape[0] // 2
hit_labels = set()
for x in range(roi.shape[1]):
label = labels[center_y, x]
if label != 0:
hit_labels.add(label)
# 4. Record and draw boxes
for label in hit_labels:
x, y, w, h, area = stats[label]
# Horizontal expansion from component, vertical from original box
box = (x + x_min, y_min, w, y_max - y_min)
cv2.rectangle(image_copy, (box[0], box[1]), (box[0]+box[2], box[1]+box[3]), (0,255,0), 1)
cv2.imwrite('output.png', image_copy)