pythonopencvmultiprocessingimutils

Multi process Video Processing


I would like to do video processing on neighboring frames. More specific, I would like to compute the mean square error between neighboring frames:

mean_squared_error(prev_frame,frame)

I know how to compute this in a linear straightforward way: I use the imutils package to utilize a queue to decouple loading the frames and processing them. By storing them in a queue, I don't need to wait for them before I can process them. ... but I want to be even faster...

# import the necessary packages to read the video
import imutils
from imutils.video import FileVideoStream
# package to compute mean squared errror
from skimage.metrics import mean_squared_error

if __name__ == '__main__':

    # SPECIFY PATH TO VIDEO FILE
    file = "VIDEO_PATH.mp4" 

    # START IMUTILS VIDEO STREAM
    print("[INFO] starting video file thread...")
    fvs = FileVideoStream(path_video, transform=transform_image).start()

    # INITALIZE LIST to store the results
    mean_square_error_list = []

    # READ PREVIOUS FRAME
    prev_frame = fvs.read()

    # LOOP over frames from the video file stream
    while fvs.more():

        # GRAP THE NEXT FRAME from the threaded video file stream
        frame = fvs.read()

        # COMPUTE the metric
        metric_val = mean_squared_error(prev_frame,frame)
        mean_square_error_list.append(1-metric_val) # Append to list

        # UPDATE previous frame variable 
        prev_frame = frame

Now my question is: How can I mutliprocess the computation of the metric to increase speed and save time ?

My operating system is Windows 10 and I am using python 3.8.0


Solution

  • There are too many aspects of making things faster, I'll only focus on the multiprocessing part.

    As you don't want to read the whole video at a time, we have to read the video frame by frame.

    I'll be using opencv (cv2), numpy for reading the frames, calculating mse, and saving the mse to disk.

    First, we can start without any multiprocessing so we can benchmark our results. I'm using a video of 1920 by 1080 dimension, 60 FPS, duration: 1:29, size: 100 MB.

    import cv2
    import sys
    import time
    
    import numpy as np
    import subprocess as sp
    import multiprocessing as mp
    
    filename = '2.mp4'
    
    def process_video():    
        cap = cv2.VideoCapture(filename)
    
        proc_frames = 0
    
        mse = []
        prev_frame = None
        ret = True
        while ret:
            ret, frame = cap.read() # reading frames sequentially
            if ret == False:
                break
    
            if not (prev_frame is None):
                c_mse = np.mean(np.square(prev_frame-frame))
                mse.append(c_mse)
    
            prev_frame = frame
    
            proc_frames += 1
    
        np.save('data/' + 'sp' + '.npy', np.array(mse))
    
        cap.release()
        return
    
    
    if __name__ == "__main__":
    
        t1 = time.time()
    
        process_video()
    
        t2 = time.time()
    
        print(t2-t1)
    

    In my system, it runs for 142 secs.

    Now, we can take the multiprocessing approach. The idea can be summarized in the following illustration.


    opencv multiprocessing frames

    GIF credit: Google


    We make some segments (based on how many cpu cores we have) and process those segmented frames in parallel.

    import cv2
    import sys
    import time
    
    import numpy as np
    import subprocess as sp
    import multiprocessing as mp
    
    filename = '2.mp4'
    
    def process_video(group_number):    
        cap = cv2.VideoCapture(filename)
        num_processes = mp.cpu_count()
        frame_jump_unit = cap.get(cv2.CAP_PROP_FRAME_COUNT) // num_processes
        cap.set(cv2.CAP_PROP_POS_FRAMES, frame_jump_unit * group_number)
        proc_frames = 0
    
        mse = []
        prev_frame = None
        while proc_frames < frame_jump_unit:
            ret, frame = cap.read()
            if ret == False:
                break
    
            if not (prev_frame is None):
                c_mse = np.mean(np.square(prev_frame-frame))
                mse.append(c_mse)
    
            prev_frame = frame
    
            proc_frames += 1
    
        np.save('data/' + str(group_number) + '.npy', np.array(mse))
    
        cap.release()
        return
    
    
    if __name__ == "__main__":
    
        t1 = time.time()
    
        num_processes =  mp.cpu_count()
        print(f'CPU: {num_processes}')
    
        # only meta-data
        cap = cv2.VideoCapture(filename)
    
        width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
        height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
        fps = cap.get(cv2.CAP_PROP_FPS)
        frame_jump_unit = cap.get(cv2.CAP_PROP_FRAME_COUNT) // num_processes
        cap.release()
    
        p = mp.Pool(num_processes)
        p.map(process_video, range(num_processes))
    
        # merging
    
    
    
        # the missing mse will be 
    
        final_mse = []
        for i in range(num_processes):
            na = np.load(f'data/{i}.npy')
            final_mse.extend(na)
    
    
            try:
                cap = cv2.VideoCapture(filename) # you could also take it outside the loop to reduce some overhead
                frame_no = (frame_jump_unit) * (i+1) - 1
                print(frame_no)
                cap.set(1, frame_no)
                _, frame1 = cap.read()
                #cap.set(1, ((frame_jump_unit) * (i+1)))
                _, frame2 = cap.read()
                c_mse = np.mean(np.square(frame1-frame2))
                final_mse.append(c_mse)
                cap.release()
            except:
                print('failed in 1 case')
                # in the last few frames, nothing left
                pass
    
    
    
    
        t2 = time.time()
    
        print(t2-t1)
    
        np.save(f'data/final_mse.npy', np.array(final_mse))
    
    
    

    I'm using just numpy save to save the partial results, you can try something better.

    This one runs for 49.56 secs with my cpu_count = 12. There are definitely some bottlenecks that can be avoided to make it run faster.

    enter image description here

    The only issue with my implementation is, it's missing the mse for regions where the video was segmented, it's pretty easy to add. As we can index individual frames at any location with OpenCV in O(1), we can just go to those locations and calculate mse separately and merge to the final solution. [Check the updated code it fixes the merging part]

    You can write a simple sanity check to ensure, both provide the same result.

    import numpy as np
    
    a = np.load('data/sp.npy')
    
    b = np.load('data/final_mse.npy')
    
    print(a.shape)
    
    print(b.shape)
    
    print(a[:10])
    
    print(b[:10])
    
    for i in range(len(a)):
        if a[i] != b[i]:
            print(i)
    

    Now, some additional speedups can come from using a CUDA-compiled opencv, ffmpeg, adding queuing mechanism plus multiprocessing, etc.