pythonpython-3.xmachine-learningsignal-processingmotion-detection

Designing a motion start/stop detector based on distance/velocity/acceleration


I trained yolov8 nano to detect fish embryos swimming in a petri dish. There is only ever 1 embryo in a dish at any given time, so this is a fairly simple task and the model performs well (mAP50=0.994). The ultimate goal of my project is to have a piece of software that takes a video as input, and have it output metrics (x,y coords at each frame, swim distance, swim velocity, etc.) ONLY for the frames in the video where the embryo is swimming. For example, a video might be 200 frames, and approx 40 first frames, the embryo is not yet swimming, and then 140 frames of swimming, and then 20 frames of no swimming (fish has stopped). So, for this video I would want to have a function that extracts only the 140 relevant frames from a csv file containing info for all the frames in the video.

The main issue with using a hard-coded algorithm to do this is that the data is noisy, making the end of an embryo's swim pattern hard to detect. For example, a min velocity-per-frame figure (given that an embryo can swim a min of 1 pixel) is usually around 10mm/s. However, random variability in the model's predictions shift the bounding box's center by a few pixels even when the fish is still, so the noise is around 10-20mm/s. For this reason, I applied simple exponential smoothing to the velocity column to try and reduce noise:

def simple_exponential_smoothing(data, alpha):
    """
    Apply simple exponential smoothing to the data.

    Parameters:
    data (array-like): The input time series data.
    alpha (float): The smoothing factor (0 < alpha <= 1).

    Returns:
    np.ndarray: The smoothed time series data.
    """
    result = [data[0]]  # First value is same as series
    for n in range(1, len(data)):
        result.append(alpha * data[n] + (1 - alpha) * result[n-1])
    return np.array(result)

My initial approach is to use a csv file (containing one video's predictions, one for each frame), and run a "detector" function on it. I attempted to use the following function to extract start and end frames so that I can trim the data into only the relevant frames for further calculations:

def find_start_end_rows(df, velocity_column, filtered_velocity_column, frame_rate):
        """
        Find the start and end row indices based on a more refined approach.

        Parameters:
        df (pd.DataFrame): The dataframe to analyze.
        velocity_column (str): The name of the velocity column to search.
        filtered_velocity_column (str): The name of the filtered velocity column.

        Returns:
        tuple: A tuple containing the start row index and the end row index.
        """
        start_row = None
        end_row = None
        velocity_threshold = 20  # Minimum velocity to start swim
        filtered_velocity_threshold = 10  # Minimum filtered velocity to consider movement
        consistent_low_velocity_frames = 5  # Number of consecutive low-velocity frames to detect the end

        # Find the start row
        for i in range(len(df)):
            if df.loc[i, velocity_column] >= velocity_threshold:
                start_row = i - 1
                break

        # If start_row is still None, it means no value >= 20 was found
        if start_row is None:
            return (-1,-1)  # -1 indicates the function failed

        # Find the end row by checking for consistent low velocities after the start row
        low_velocity_count = 0
        for i in range(start_row + 2, len(df)):
            if df.loc[i, filtered_velocity_column] < filtered_velocity_threshold:
                low_velocity_count += 1
                if low_velocity_count >= consistent_low_velocity_frames:
                    end_row = i - consistent_low_velocity_frames
                    break
            else:
                low_velocity_count = 0

        # If end_row is still None, it means no consistent low-velocity frames were found
        if end_row is None:
            end_row = len(df) - 1

        return start_row, end_row

As we can see in the graph below, however, this function does not perform very well. The graph demonstrates the error in start frame prediction and end frame prediction using this function (comparing the function's outputs to the true start/end frames in those videos). It is crucial to the project that the variability we see in predicting start/end frames is at most 2-3 frames.

Plot showing find_start_end_frames error

What approach might be best to detecting start/end frames within a video? It would be great to solve this algorithmically instead of having to train a whole other ML model for this task, but I am open to any solutions people might think will work.


Solution

  • You probably just need to apply more sophisticated filtering to the position data, to suppress the noise and make the real motion stand out.

    The noise is probably evenly spread across frequencies, but the "difference between frames" operation is a high-pass operation that will accentuate the high frequency noise. Fish swimming should be pretty low frequency, though so a good low-pass filter with an appropriate cut-off frequency should isolate it pretty well.

    The other thing you can do that would really help is to increase your frame rate. This will spread the noise over a wider frequency range, so that less of it will overlap with real data, and it will be easier to filter out.

    Do an FFT or spectrogram of your position data to see what the frequency distribution currently looks like so you know what to filter out, and then you can use a tool like this one to design an IIR filter.

    You should apply something like a 2-degree Butterworth low-pass to your data in both the forward and backward directions so there's no time shift. Do the "difference between frames" operation either before or after (it doesn't matter which) to produce a nicely filtered "fish velocity" graph.