The Matlab function as the pdist2
function that can compute the distance between each row in e.g L2-norm
. GNU Octave have the same function pdist2
.
The difference between those function is that Matlab can return the indexes [D, I] = pdist2(X, Y, 'norm')
while GNU Octave can only return the distances [D] = pdist2(X, Y, 'norm')
.
Question:
Assume that you have matrix X
X = [-0.2515 1.0451 -1.2817
-1.9741 0.2782 -1.0234
10.6772 10.7365 9.9264
8.7785 11.1680 9.5915
-34.2330 -30.8410 -31.7200
-32.6290 -31.7860 -31.2900
45.0000 43.0000 0
23.0000 -3.0000 2.0000];
And you are using D = pdist2(X, X, "euclidean")
for the ecluidian distance
D = [0.0000000 1.9032127 18.4114208 17.3850403 55.6593018 55.0153084 61.7215919 23.8278294
1.9032127 0.0000000 19.7314529 18.6247940 54.3260574 53.7019653 63.5040855 25.3691425
18.4114208 19.7314529 0.0000000 1.9757286 74.0272751 73.3647156 48.1406441 20.0840893
17.3850403 18.6247940 1.9757286 0.0000000 72.9478302 72.3251266 49.1657410 21.4619255
55.6593018 54.3260574 74.0272751 72.9478302 0.0000000 1.9108551 112.8561935 72.0262222
55.0153084 53.7019653 73.3647156 72.3251266 1.9108551 0.0000000 112.2420197 70.9326706
61.7215919 63.5040855 48.1406441 49.1657410 112.8561935 112.2420197 0.0000000 51.0294037
23.8278294 25.3691425 20.0840893 21.4619255 72.0262222 70.9326706 51.0294037 0.0000000];
How can I produce the indexes I
if I don't have the access through the pdist2
function?
Following are 3 methods and they are tested for performance over a 10k square matrix.
With sort()
, takes ~1.11 seconds
function nearest_indices = find_nearest_indices(D)
% Find the indices of the nearest points for each point in the matrix.
% Args:
% D : A square matrix containing distances between points.
% Returns:
% An array of indices of the nearest points.
% Usage:
% nearest_indices = find_nearest_indices(D);
% Number of rows in D
n = size(D, 1);
% Preallocate array for indices
nearest_indices = zeros(n, 1);
for i = 1:n
% Sort the distances in the current row
[sorted_distances, sorted_indices] = sort(D(i, :));
% Find the index of the nearest neighbor
% The first element in sorted_indices is the point itself, so we take the second element
nearest_indices(i) = sorted_indices(2);
end
end
Without sort()
, takes ~1.08 seconds
function nearest_indices = find_nearest_indices(D)
% Number of rows in D
n = size(D, 1);
% Preallocate array for indices
nearest_indices = zeros(n, 1);
for i = 1:n
% Copy the row and set the diagonal element (self-distance) to Inf
row = D(i, :);
row(i) = Inf;
% Find the index of the minimum non-zero distance
[~, min_index] = min(row);
nearest_indices(i) = min_index;
end
end
Without a for loop
, takes ~0.55 seconds
function nearest_indices = find_nearest_indices(D)
% Set the diagonal elements to Inf to ignore self-distances
n = size(D, 1);
D(sub2ind(size(D), 1:n, 1:n)) = Inf;
% Find the index of the minimum non-zero distance in each row
[~, nearest_indices] = min(D, [], 2);
end