matlabindexingoctaveeuclidean-distancepdist

How to produce the indexes from pdist2 function in Octave?


The Matlab function as the pdist2 function that can compute the distance between each row in e.g L2-norm. GNU Octave have the same function pdist2.

The difference between those function is that Matlab can return the indexes [D, I] = pdist2(X, Y, 'norm') while GNU Octave can only return the distances [D] = pdist2(X, Y, 'norm').

Question:

Assume that you have matrix X

X = [-0.2515    1.0451   -1.2817
     -1.9741    0.2782   -1.0234
     10.6772   10.7365    9.9264
      8.7785   11.1680    9.5915
    -34.2330  -30.8410  -31.7200
    -32.6290  -31.7860  -31.2900
     45.0000   43.0000         0
     23.0000   -3.0000    2.0000];

And you are using D = pdist2(X, X, "euclidean") for the ecluidian distance

D = [0.0000000       1.9032127       18.4114208      17.3850403      55.6593018      55.0153084      61.7215919      23.8278294
     1.9032127       0.0000000       19.7314529      18.6247940      54.3260574      53.7019653      63.5040855      25.3691425
     18.4114208      19.7314529      0.0000000       1.9757286       74.0272751      73.3647156      48.1406441      20.0840893
     17.3850403      18.6247940      1.9757286       0.0000000       72.9478302      72.3251266      49.1657410      21.4619255
     55.6593018      54.3260574      74.0272751      72.9478302      0.0000000       1.9108551       112.8561935     72.0262222
     55.0153084      53.7019653      73.3647156      72.3251266      1.9108551       0.0000000       112.2420197     70.9326706
     61.7215919      63.5040855      48.1406441      49.1657410      112.8561935     112.2420197     0.0000000       51.0294037
     23.8278294      25.3691425      20.0840893      21.4619255      72.0262222      70.9326706      51.0294037      0.0000000];

How can I produce the indexes I if I don't have the access through the pdist2 function?


Solution

  • Following are 3 methods and they are tested for performance over a 10k square matrix.

    Iteration 1

    With sort(), takes ~1.11 seconds

    function nearest_indices = find_nearest_indices(D)
        % Find the indices of the nearest points for each point in the matrix.
        % Args:
        % D : A square matrix containing distances between points.
        % Returns:
        % An array of indices of the nearest points.
        % Usage:
        % nearest_indices = find_nearest_indices(D);
    
        % Number of rows in D
        n = size(D, 1);
        
        % Preallocate array for indices
        nearest_indices = zeros(n, 1);
    
        for i = 1:n
            % Sort the distances in the current row
            [sorted_distances, sorted_indices] = sort(D(i, :));
    
            % Find the index of the nearest neighbor
            % The first element in sorted_indices is the point itself, so we take the second element
            nearest_indices(i) = sorted_indices(2);
        end
    end
    

    Iteration 2

    Without sort(), takes ~1.08 seconds

    function nearest_indices = find_nearest_indices(D)
        % Number of rows in D
        n = size(D, 1);
        
        % Preallocate array for indices
        nearest_indices = zeros(n, 1);
        for i = 1:n
            % Copy the row and set the diagonal element (self-distance) to Inf
            row = D(i, :);
            row(i) = Inf;
    
            % Find the index of the minimum non-zero distance
            [~, min_index] = min(row);
            nearest_indices(i) = min_index;
        end
    end
    

    Iteration 3

    Without a for loop, takes ~0.55 seconds

    function nearest_indices = find_nearest_indices(D)
        % Set the diagonal elements to Inf to ignore self-distances
        n = size(D, 1);
        D(sub2ind(size(D), 1:n, 1:n)) = Inf;
    
        % Find the index of the minimum non-zero distance in each row
        [~, nearest_indices] = min(D, [], 2);
    end