i'm trying to detect some tables from an image but the images structure makes it hard to use some libraries to extract them, so i decided to extract them as images, i tried to use the code below to draw bounding boxes around the rectangles, the code is working but it seem not to be detecting the rectangles in lighter colors:
this is the code i'm using:
import numpy as np
import cv2
#load the image
image = cv2.imread("aaaaaaaaaaa.jpg")
# grayscale
result = image.copy()
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
# adaptive threshold
thresh = cv2.adaptiveThreshold(gray,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV,51,9)
# Fill rectangular contours
cnts = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
cv2.drawContours(thresh, [c], -1, (255,255,255), -1)
# Morph open
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (9,9))
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=4)
# Draw rectangles, the 'area_treshold' value was determined empirically
cnts = cv2.findContours(opening, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
area_treshold = 4000
for c in cnts:
if cv2.contourArea(c) > area_treshold :
x,y,w,h = cv2.boundingRect(c)
cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), 3)
cv2.imwrite('thresh.jpg', thresh)
cv2.imwrite('opening.jpg', opening)
cv2.imwrite('image.jpg', image)
cv2.waitKey()
this is the input: this is the input image
if you notice from the image it only detects boxes that are dark enough and not the lighter boxesthis is the output
any help will be greatly appreciated
If you know the color of the table, what about using cv2.inRange()
to threshold the image?
Here is an example with 2 different ranges to separate the table header from the body
import numpy as np
import cv2
image = cv2.imread("image.jpg")
hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
thresh_header = cv2.inRange(hsv, (10, 130, 0), (20, 140, 255))
thresh_body = cv2.inRange(hsv, (10, 80, 0), (20, 90, 255))
You can now reuse the code you've done to find the contours and draw the rectangles:
def draw_countours(image, thresh, color):
cnts, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
for c in cnts:
if cv2.contourArea(c) < area_treshold:
continue
x, y, w, h = cv2.boundingRect(c)
cv2.rectangle(image, (x, y), (x + w, y + h), color, 3)
draw_countours(image, thresh_header, (0, 0, 255))
draw_countours(image, thresh_body, (255, 0, 0))
Note that, of course, it only works if you know in advance the color of the table.
EDIT from comments
is there a way i can extract or crop the whole table. that is the headers and the body altogether
In this case, you can extend the HSV range used to threshold the image to get both header and body in the same mask (thresh_header + thresh_body
).
To get the overall rectangle, without the lines, you can use a morphological transformation such as closing. Here is an example:
# Threshold with a broader HSV range
thresh = cv2.inRange(hsv, (10, 80, 0), (20, 140, 255))
kernel = np.ones((10, 10), np.uint8)
thresh = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)
Here is the contour obtained using the draw_countours()
defined above:
You can know crop the image using the rectangle bbox. Make sure to remove all noise contours to only keep the main contour of interest.
cnts, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
rects = [cv2.boundingRect(cnt) for cnt in cnts if cv2.contourArea(cnt) > area_treshold]
x, y, w, h = rects[0]
crop = image[y:y+h, x:x+w]