Efficient Algorithems to find position of vector in 2D array

I have an interesting problem but cannot find the relevant literature, so need algorithem suggestions or just the correct way to phrase the problem so I can find the literature.

I'm given a vector of floats of varying length coming from a 2D array. I know the orientation (i.e. which axis I'm looking in). I need to find the start index of the vector. Essentially, I'm cross-correlating a short vector with a long vector.

I have implemented a brute force approach but naturally it grows as O(n^2) where n is the dimension of the 2D array.

Q1) What is an efficient algorithem to tackel this problem?

Q2) How is this type of problem called (so I can find relevant papers)?

There is measurement error so it will never be an exact match.

Here the brute force approach, where I look for the minimum of the norm of two vectors:

import time
import numpy as np


def get_distance(a: np.ndarray, b: np.ndarray) -> float:
    """ Calculate the distance between two vectors. """
    return  np.linalg.norm(a - b)


def find_min_distance_subvector(array: np.ndarray, x: np.ndarray) -> tuple[int, float]:
    leng = len(x)
    min_index = 0
    min_distance = np.inf

    # Assuming we know the orientation of the vector
    for ii in range(0, len(array) - leng):
        # Extract a sub-vector of the same length as x, starting from index ii
        sub_vec = array[ii:ii + leng]
        dist = get_distance(sub_vec, x)
        if dist < min_distance:
            min_distance = dist
            min_index = ii
    return min_index, min_distance


def main():
    leng = 100
    size = 2000

    # Create the search map
    arr = np.random.random((size, size))

    # Pick a random sub-vector from the map
    index = np.random.randint(0, size - leng)
    x = arr[index:index + leng]

    start_time = time.time()
    min_index, min_metric = find_min_distance_subvector(arr, x)
    end_time = time.time()
    print(f"Minimum distance: {min_metric} at index {min_index}, correct index: {index}, "
          f"time taken: {end_time - start_time:.4f} seconds")


if __name__ == '__main__':
    main()

Thank you for your help

Solution

For your case, to improve from O(n^2), consider following algorithms:

Sliding Window Technique:

Instead of recalculating the distance for every sub-vector from scratch, you can maintain a running distance calculation as you slide through the array. This can significantly reduce the number of calculations needed.

Dynamic Programming:

If the problem allows, you can use dynamic programming to store previously computed distances and reuse them, which can help in reducing redundant calculations.

Fast Fourier Transform (FFT):

If your vector and the 2D array have periodic properties, you might consider using FFT to find correlations efficiently. This can reduce the complexity to O(n log n).

KMP Algorithm (Knuth-Morris-Pratt):

If your problem can be framed as a substring search (where the vector is treated as a pattern), the KMP algorithm can be used to find occurrences of the vector in the 2D array efficiently.

Locality-Sensitive Hashing (LSH):

For high-dimensional data, LSH can be used to approximate nearest neighbors efficiently, which might help in finding the start index of the vector.