pythonalgorithmsearchgraph-traversal

Efficient Algorithems to find position of vector in 2D array


I have an interesting problem but cannot find the relevant literature, so need algorithem suggestions or just the correct way to phrase the problem so I can find the literature.

I'm given a vector of floats of varying length coming from a 2D array. I know the orientation (i.e. which axis I'm looking in). I need to find the start index of the vector. Essentially, I'm cross-correlating a short vector with a long vector.

I have implemented a brute force approach but naturally it grows as O(n^2) where n is the dimension of the 2D array.

Q1) What is an efficient algorithem to tackel this problem?

Q2) How is this type of problem called (so I can find relevant papers)?

There is measurement error so it will never be an exact match.

Here the brute force approach, where I look for the minimum of the norm of two vectors:

import time
import numpy as np


def get_distance(a: np.ndarray, b: np.ndarray) -> float:
    """ Calculate the distance between two vectors. """
    return  np.linalg.norm(a - b)


def find_min_distance_subvector(array: np.ndarray, x: np.ndarray) -> tuple[int, float]:
    leng = len(x)
    min_index = 0
    min_distance = np.inf

    # Assuming we know the orientation of the vector
    for ii in range(0, len(array) - leng):
        # Extract a sub-vector of the same length as x, starting from index ii
        sub_vec = array[ii:ii + leng]
        dist = get_distance(sub_vec, x)
        if dist < min_distance:
            min_distance = dist
            min_index = ii
    return min_index, min_distance


def main():
    leng = 100
    size = 2000

    # Create the search map
    arr = np.random.random((size, size))

    # Pick a random sub-vector from the map
    index = np.random.randint(0, size - leng)
    x = arr[index:index + leng]

    start_time = time.time()
    min_index, min_metric = find_min_distance_subvector(arr, x)
    end_time = time.time()
    print(f"Minimum distance: {min_metric} at index {min_index}, correct index: {index}, "
          f"time taken: {end_time - start_time:.4f} seconds")


if __name__ == '__main__':
    main()

Thank you for your help


Solution

  • For your case, to improve from O(n^2), consider following algorithms:

    Instead of recalculating the distance for every sub-vector from scratch, you can maintain a running distance calculation as you slide through the array. This can significantly reduce the number of calculations needed.

    If the problem allows, you can use dynamic programming to store previously computed distances and reuse them, which can help in reducing redundant calculations.

    If your vector and the 2D array have periodic properties, you might consider using FFT to find correlations efficiently. This can reduce the complexity to O(n log n).

    If your problem can be framed as a substring search (where the vector is treated as a pattern), the KMP algorithm can be used to find occurrences of the vector in the 2D array efficiently.

    For high-dimensional data, LSH can be used to approximate nearest neighbors efficiently, which might help in finding the start index of the vector.