I'm trying to calculate the
1) Euclidean distance, and
2) Mahalanobis distance
for a set of matricies in r. I've been doing it as such:
v1 <- structure(c(0.508, 0.454, 0, 2.156, 0.468, 0.488, 0.682, 1, 1.832,
0.44, 0.928, 0.358, 1, 1.624, 0.484, 0.516, 0.378, 1, 1.512,
0.514, 0.492, 0.344, 0, 1.424, 0.508, 0.56, 0.36, 1, 1.384, 0.776,
1.888, 0.388, 0, 1.464, 0.952, 0.252, 0.498, 1, 1.484, 0.594,
0.256, 0.54, 2, 2.144, 0.402, 0.656, 2.202, 1, 1.696, 0.252),
.Dim = c(5L, 10L),
.Dimnames = list(NULL, c("KW_1", "KW_2", "KW_3", "KW_4", "KW_5", "KW_6", "KW_7", "KW_8", "KW_9", "KW_10")))
v2 <- structure(c(1.864, 1.864, 1.864, 1.864, 1.864, 1.6, 1.6, 1.6,
1.6, 1.6, 1.536, 1.536, 1.536, 1.536, 1.536, 1.384, 1.384, 1.384,
1.384, 1.384, 6.368, 6.368, 6.368, 6.368, 6.368, 2.792, 2.792,
2.792, 2.792, 2.792, 2.352, 2.352, 2.352, 2.352, 2.352, 2.624,
2.624, 2.624, 2.624, 2.624, 1.256, 1.256, 1.256, 1.256, 1.256,
1.224, 1.224, 1.224, 1.224, 1.224),
.Dim = c(5L, 10L),
.Dimnames = list(NULL, c("KW_1", "KW_2", "KW_3", "KW_4", "KW_5", "KW_6", "KW_7", "KW_8", "KW_9", "KW_10")))
L2 <- sqrt(rowSums((v1-v2)^2)) # Euclidean distance for each row
which provides:
[1] 7.132452 7.568359 7.536904 5.448696 7.163580
That's perfect! But I've heard you can also compute Euclidean/L2 distance using the following form:
I'd like to calculate my distance this way because the Mahalanobis distance is simply this and the covariance matrix. See this.
I haven't figured out how to code this in r, however. I've tried:
sqrt(crossprod((t(v1)-t(v2))))
and
sqrt((v1-v2) %*% t(v1-v2))
But they just don't give me what I want. Suggestions?
Note -
I'm looking to do this as a single operation, not in a loop of any kind. It has to be very fast because I'm doing it over millions of rows multiple times. Maybe it's not possible. I'm open to changing the format of v1
and v2
.
You need to apply the formula to each row individually, so something like:
> sapply(1:nrow(v1), function(i) {
+ q = v1[i, ] - v2[i, ]
+ d = sqrt(t(q) %*% q)
+ d
+ })
[1] 7.132452 7.568359 7.536904 5.448696 7.163580
If you need something faster you can always try the same thing in C++ (code adapted from here):
#include <Rcpp.h>
using namespace Rcpp;
double dist2(NumericVector x, NumericVector y){
double d = sqrt( sum( pow(x - y, 2) ) );
return d;
}
// [[Rcpp::export]]
NumericVector calc_l2 (NumericMatrix x, NumericMatrix y){
int out_length = x.nrow();
NumericVector out(out_length);
for (int i = 0 ; i < out_length; i++){
NumericVector v1 = x.row(i);
NumericVector v2 = y.row(i);
double d = dist2(v1, v2);
out(i) = d;
}
return (out) ;
}
Running in R:
library(Rcpp)
sourceCpp("calc_L2.cpp")
calc_l2(v1, v2)