matlabmatrixdistancemahalanobis

Mahalanobis distance for a matrix (mxn) with m<<n


I have a 12x202 matrix (12 instance which have 202 features). I want to calculate mahalanobis distance between each 12 instances, but it seems that the number of columns cannot be very larger than number of instances (rows). (I had no problem calculating the distance for 12x11 matrix but more than 11 features would cause error in MATLAB using either linkage(X,'ward','mahalanobis'); or mahal(X,X); or pdist2(X,X,'mahalanobis'); )


Solution

  • If you look in the matlab documentation for mahal function it says:

    X and Y must have the same number of columns, but can have different numbers of rows. X must have more rows than columns.

    I'm not quite good in statistics, so I'm not sure why this condition is important, but I suppose that it is for efficiency reasons and also 12 measures is too low number, so considerer having more measures.

    The thing that you could do is to compute the mahalabonis distance yourself, it is easy to get the formaula in the same doc, and also the example that gives there is a better calculation for the mahalabanois distance:

    Mahalanobis distance is also called quadratic distance. It measures the separation of two groups of objects. Suppose we have two groups with means and , Mahalanobis distance is given by the following

    so is for the different group, not for the same.

    In any case you could use this:

    function MD = my_MahalanobisDistance(X, Y)
    
    [nX, mX] = size(X);
    [nY, mY] = size(Y);
    
    n = nX + nY;
    
    if(mX ~= mY)
        disp('Columns in X must be same as in Y')
    else
        xDiff = mean(X) - mean(Y);
        cX = my_covariance(X);
        cY = my_covariance(Y);
        pC = nX/n*cX + nY/n*cY;          
        MD = sqrt(xDiff * inv(pC) * xDiff');
    end
    

    and for the covariance:

    function C = my_covariance(X) 
    [n,m] = size(X); 
    Xc = X -repmat(mean(X),n,1); 
    C = Xc'* Xc/n;
    

    I hope that this helps you