pythonopencvaveragergbface-detection

Using OpenCV face detection to obtain RGB pixels of a recorded video


I'm new to Python and currently studying about it. I'm currently using OpenCV haarcascade for facial detection (which has already been done) and is currently trying to obtain the RGB pixels of each frame in the recorded video and output it as the average RGB value of each signal of each frame in a graph. But the problem I encountered was that the graph only displays the RGB values obtained in the early few seconds and stopped there, and the output is probably not the average.

I've seen some samples using .append to obtain the average of their RGB images and have tried it expecting the obtained RGB values to be averaged to each frame giving me 3 different values in a line graph, but the graph that came out only gave the RGB values of the first few seconds and stopped there (the recorded video is around 20 seconds long) and the outputted results were not the average of the RGB vales.

The Python code below is the one which I'm using. I would very much appreciate any kind of help and advice given. Thank you.

import cv2
from PIL import Image
import datetime
import matplotlib.pyplot as plt
from statistics import mean

face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
eye_cascade = cv2.CascadeClassifier('haarcascade_eye.xml')

cap = cv2.VideoCapture('Video1.MOV')

b = []
g = []
r = []

while True:
    ret, img = cap.read()
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    faces = face_cascade.detectMultiScale(gray, scaleFactor=1.3, minNeighbors=5)
    for x, y, w, h in faces:
        cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
        face = img[y: y + h, x: x + w]
        face_gray = gray[y: y + h, x: x + w]

        t_now = datetime.datetime.now().time()
        print('The Time is:{}'.format(t_now))

        B,G,R = face[150, 150]

        print('BLUE:{}'.format(B), 'GREEN:{}'.format(G), 'RED:{}'.format(R))
        print('Size of ROI:{}'.format(face.shape))

        b.append(B)
        g.append(G)
        r.append(R)

        plt.xlabel('Time(s)')
        plt.ylabel('RGB Signals')

    plt.plot(b, g, label = "line 1, Blue", color = 'blue')
    plt.plot(g, label = "line 2, Green", color = 'green')
    plt.plot(r, label = "line 3, Red", color = 'red')

    plt.legend()
    plt.show()

    cv2.imshow('video image', img)
    fps = cap.get(cv2.CAP_PROP_FPS)
    print('FPS:{}'.format(fps))
    key = cv2.waitKey(10)
    if key == 27:
        break

cap.release()
cv2.destroyAllWindows()

Solution

  • On the line:

    B, G, R = face[150, 150]
    

    you get the intensity value for only one single pixel inside the detected face with coordinates 150, 150. I think that this is not what you intended. In your question you wrote that you are interested in the intensity values for the whole frame, but I guess you only mean the values inside the bounding box around the detected face.

    In addition, the values obtained do not represent any average, because in your code you do not use any method that would calculate the average. The "append" method you mentioned is actually used to insert an element at the end of the list.

    You can check the shape of the face variable: you will notice that the first two values describe the size of the area in pixels, and the third indicates the number of channels - in our case, since the image is RGB, the number of channels will be 3. So you need to refer to specific channels and assign them to variables B, G, R. You can compute average values e.g. using the mean function inside the numpy package.

        B = np.mean(face[:, :, 0])  
        G = np.mean(face[:, :, 1])  
        R = np.mean(face[:, :, 2])  
    
        b.append(B)
        g.append(G)
        r.append(R)
    

    I have included the entire code below:

    import cv2
    import datetime
    import matplotlib.pyplot as plt
    import numpy as np
    
    face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
    eye_cascade = cv2.CascadeClassifier('haarcascade_eye.xml')
    cap = cv2.VideoCapture('Video1.mp4')
    
    b = []
    g = []
    r = []
    
    while True:
        ret, img = cap.read()
        gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
        faces = face_cascade.detectMultiScale(gray, scaleFactor=1.3, minNeighbors=5)
        for x, y, w, h in faces:
    
            cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
            face = img[y: y + h, x: x + w]
            face_gray = gray[y: y + h, x: x + w]
    
            t_now = datetime.datetime.now().time()
            print('The Time is:{}'.format(t_now))
    
            B = np.mean(face[:, :, 0])  # Blue channel
            G = np.mean(face[:, :, 1])  # Green channel
            R = np.mean(face[:, :, 2])  # Red channel
    
            b.append(B)
            g.append(G)
            r.append(R)
    
            print('BLUE:{}'.format(B), 'GREEN:{}'.format(G), 'RED:{}'.format(R))
            print('Size of ROI:{}'.format(face.shape))
    
            plt.xlabel('Time(s)')
            plt.ylabel('RGB Signals')
    
        plt.plot(b, label = "line 1, Blue", color = 'blue')
        plt.plot(g, label = "line 2, Green", color = 'green')
        plt.plot(r, label = "line 3, Red", color = 'red')
        plt.legend()
        plt.show()
    
        cv2.imshow('video image', img)
        key = cv2.waitKey(10)
        if key == 27:
            break
    
    cap.release()
    cv2.destroyAllWindows()
    

    However, if you want to calculate the pixel values for the whole frame and not just inside the bbox, just substitute face with img in the inner loop.

    Of course, your code has other minor issues that can be improved, for example you can move the lines in which you create a graph to the outside of the loop so that you only get one graph at the end.