I have a diagram in a PDF format. I am using pdfminer.six
to extract the text present in the diagram as well as the bounding boxes of the text. Everything is fine so far.
System info: Windows 10, Python 3.9.13
Now I want to draw these bounding boxes on an image of the pdf and create a visualization using OpenCV rectangle()
. I am unsure about how to do this as the DPI is needed to convert the pdf to an image using pdf2image
.
Can anyone tell me how to draw this visualization using the bounding box data given by pdfminer
?
Code
I am providing an example code of the bounding box extraction using pdfminer
as well as a sample output to show how the bounding boxes are returned.
from pdfminer.high_level import extract_pages
from pdfminer.layout import LTTextBox, LAParams
import cv2 as cv
import os
import numpy as np
path = r"sample.pdf"
assert os.path.exists(path), "image path is wrong"
laparams = LAParams(detect_vertical=True)
for page_layout in extract_pages(path, laparams=laparams):
for element in page_layout:
if isinstance(element, LTTextBox):
print(element.bbox)
A snapshot of the output I am getting is as follows:
....
(64.46833119999998, 758.4685649600001, 143.16671221999994, 763.35576496)
(399.3279, 797.7414805999999, 464.28060816000004, 812.3556692000001)
(520.1078, 797.7414805999999, 631.1472937599999, 812.3556692000001)
(676.9479, 797.7414805999999, 762.4986252799999, 812.3556692000001)
(709.8279, 787.0014805999999, 729.9863304, 792.5868806)
....
This is how I solved the problem: the bounding boxes of all textlines are converted into Pandas data frame. (You can use a list as well) I use this function to draw a rectangle in the page:
def draw(img,BBOX,Size,randomcolor:bool=False):
rsx = int(np.floor(BBOX[0]))
rsy = int(np.floor(Size[3])-np.floor(BBOX[1]))
rex = int(np.floor(BBOX[2]))
rey = int(np.floor(Size[3])-np.floor(BBOX[3]))
if (randomcolor):
R = random.randint(20, 255)
G = random.randint(20, 255)
B = random.randint(20, 255)
else:
R = 255
G = 255
B = 255
# cv.rectangle(img,(rsx,rsy),(rex,rey), (red,green,0),1)
cv.rectangle(img,(rsx,rsy),(rex,rey), (B,G,R),1)
and then the following function handles the input
def draw_textlines(list_of_BBOXes,page_size,filename,randomcolor=False):
img = np.zeros((int(page_size[3]),int(page_size[2])), np.uint8)
for BBOX in list_of_BBOXes:
draw(img,row.BBOX,row.size,randomcolor)
cv.imwrite( filename, img)
The media box of the page can be used as the image size (when coverted to int) You can scale the output page as you wish.