pythonalgorithmperformancenumpydistance

Using Numpy to find the average distance in a set of points


I have an array of points in unknown dimensional space, such as:

data=numpy.array(
[[ 115, 241, 314],
[ 153, 413, 144],
[ 535, 2986, 41445]])

and I would like to find the average euclidean distance between all points.

Please note that I have over 20,000 points, so I would like to do this as efficiently as possible.

Thanks.


Solution

  • Well, I don't think that there is a super fast way to do this, but this should do it:

    tot = 0.
    
    for i in xrange(data.shape[0]-1):
        tot += ((((data[i+1:]-data[i])**2).sum(1))**.5).sum()
    
    avg = tot/((data.shape[0]-1)*(data.shape[0])/2.)