pythonopencvpdfimage-processingpdfminer

How to visualize bounding boxes extracted from pdfminer.six?


I have a diagram in a PDF format. I am using pdfminer.six to extract the text present in the diagram as well as the bounding boxes of the text. Everything is fine so far.

System info: Windows 10, Python 3.9.13

Now I want to draw these bounding boxes on an image of the pdf and create a visualization using OpenCV rectangle(). I am unsure about how to do this as the DPI is needed to convert the pdf to an image using pdf2image.

Can anyone tell me how to draw this visualization using the bounding box data given by pdfminer?

Code I am providing an example code of the bounding box extraction using pdfminer as well as a sample output to show how the bounding boxes are returned.

from pdfminer.high_level import extract_pages
from pdfminer.layout import LTTextBox, LAParams
import cv2 as cv
import os
import numpy as np

path = r"sample.pdf"
assert os.path.exists(path), "image path is wrong"

laparams = LAParams(detect_vertical=True)

for page_layout in extract_pages(path, laparams=laparams):
    for element in page_layout:
        if isinstance(element, LTTextBox):
            print(element.bbox)

A snapshot of the output I am getting is as follows:

....
(64.46833119999998, 758.4685649600001, 143.16671221999994, 763.35576496)
(399.3279, 797.7414805999999, 464.28060816000004, 812.3556692000001)
(520.1078, 797.7414805999999, 631.1472937599999, 812.3556692000001)
(676.9479, 797.7414805999999, 762.4986252799999, 812.3556692000001)
(709.8279, 787.0014805999999, 729.9863304, 792.5868806)
....

Solution

  • This is how I solved the problem: the bounding boxes of all textlines are converted into Pandas data frame. (You can use a list as well) I use this function to draw a rectangle in the page:

    def  draw(img,BBOX,Size,randomcolor:bool=False):
        rsx = int(np.floor(BBOX[0]))
        rsy = int(np.floor(Size[3])-np.floor(BBOX[1]))
        rex = int(np.floor(BBOX[2]))
        rey = int(np.floor(Size[3])-np.floor(BBOX[3]))
        
        if (randomcolor):
            R = random.randint(20, 255)
            G = random.randint(20, 255)
            B = random.randint(20, 255)
        else:
            R = 255
            G = 255
            B = 255
        # cv.rectangle(img,(rsx,rsy),(rex,rey), (red,green,0),1)
        cv.rectangle(img,(rsx,rsy),(rex,rey), (B,G,R),1)
    

    and then the following function handles the input

    def draw_textlines(list_of_BBOXes,page_size,filename,randomcolor=False):
      img = np.zeros((int(page_size[3]),int(page_size[2])), np.uint8)
      for BBOX in list_of_BBOXes:
        draw(img,row.BBOX,row.size,randomcolor)
      cv.imwrite(   filename, img)
    

    The media box of the page can be used as the image size (when coverted to int) You can scale the output page as you wish.