I’m researching the subject of extracting the information from ID cards and have found a suitable algorithm to locate the face on the front. As it is, OpenCV has Haar cascades for that, but I’m unsure what can be used to extract the full rectangle that person is in instead of just the face (as is done in https://github.com/deepc94/photo-id-ocr). The few ideas that I’m yet to test are:
What can be recommended to try here as well? Any thoughts, ideas or even existing examples are fine.
Normal approach:
import cv2
import numpy as np
import matplotlib.pyplot as plt
image = cv2.imread("a.jpg")
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
_,thresh = cv2.threshold(gray,128,255,cv2.THRESH_BINARY)
cv2.imshow("thresh",thresh)
thresh = cv2.bitwise_not(thresh)
element = cv2.getStructuringElement(shape=cv2.MORPH_RECT, ksize=(7, 7))
dilate = cv2.dilate(thresh,element,6)
cv2.imshow("dilate",dilate)
erode = cv2.erode(dilate,element,6)
cv2.imshow("erode",erode)
morph_img = thresh.copy()
cv2.morphologyEx(src=erode, op=cv2.MORPH_CLOSE, kernel=element, dst=morph_img)
cv2.imshow("morph_img",morph_img)
_,contours,_ = cv2.findContours(morph_img,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
areas = [cv2.contourArea(c) for c in contours]
sorted_areas = np.sort(areas)
cnt=contours[areas.index(sorted_areas[-3])] #the third biggest contour is the face
r = cv2.boundingRect(cnt)
cv2.rectangle(image,(r[0],r[1]),(r[0]+r[2],r[1]+r[3]),(0,0,255),2)
cv2.imshow("img",image)
cv2.waitKey(0)
cv2.destroyAllWindows()
I found the first two biggest contours are the boundary, the third biggest contour is the face. Result:
There is also another way to investigate the image, using sum of pixel values by axises:
x_hist = np.sum(morph_img,axis=0).tolist()
plt.plot(x_hist)
plt.ylabel('sum of pixel values by X-axis')
plt.show()
y_hist = np.sum(morph_img,axis=1).tolist()
plt.plot(y_hist)
plt.ylabel('sum of pixel values by Y-axis')
plt.show()
Base on those pixel sums over 2 asixes, you can crop the region you want by setting thresholds for it.