[SOLVED] Extract images of people in call from Google Meets recording

Extract images of people in call from Google Meets recording

I want to extract the individual persons from the video screenshot as an image. So from this frame I want 5 images, which I'll export as 1.jpg, 2.jpg ..., 5.jpg, by creating bounding boxes for each box of video. zoom conference example.

How would you tackle this? I need a robust method.

Is there any fast simple method I'm not thinking of? Any ML model that takes care of this or is basic image processing the way to go?

Thanks in advance

Tried OpenCV thresholding, but color of background also appears in video of attendees of the call. Which adds noise as you can see.

thresholding result

Solution

Your thresholding result looks fine to me. findContours() plus boundingRect() would clean up the black parts of each camera view. contourArea() could be used to reject small white parts from becoming their own camera view.

So, for example, here's how to run findContours():

# This is your post-thresholding image
img_orig = cv2.imread('test183_image.png')
img_gray = cv2.cvtColor(img_orig, cv2.COLOR_BGR2GRAY)
# Find the contours
contours, hierarchy = cv2.findContours(img_gray, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# Draw them for debug purposes
img = img_orig.copy()
cv2.drawContours(img, contours, -1, (0, 255, 0), 10)
plt.imshow(img)

Output:

There's a seventh contour here, in the upper left corner of the image. It can be filtered out like this:

# Reject any contour smaller than min_area
min_area = 20000  # in square pixels
contours = [contour for contour in contours if cv2.contourArea(contour) >= min_area]

Output:

The next step is to find the minimum bounding rectangle for each camera using boundingRect():

# Get bounding rectangle for each contour
bounding_rects = [cv2.boundingRect(contour) for contour in contours]
# Display each rectangle
img = img_orig.copy()
for rect in bounding_rects:
    x,y,w,h = rect
    cv2.rectangle(img,(x,y),(x+w,y+h),(0,255,0),10)
plt.imshow(img)

Output:

In the bounding_rects list, you now have the x, y, width, and height of every camera.