pythonnumpyscipydistancepdist

Using Additional kwargs with a Custom Function for Scipy's cdist (or pdist)?


I am using a custom metric function with scipy's cdist function. The custom function is something like

def cust_metric(u,v):
  dist = np.cumsum(np.gcd(u,v) * k) 
  return dist

where k is an arbitrary coefficient.

Ideally, I was hoping to pass k as an argument when calling cdist like so: d_ar = scipy.spatial.distance.cdist(arr1, arr2, metric=cust_metric(k=7)) However, this throws an error.

I was wondering if there is a simple solution that I may be missing? A quick but non-elegant fix is to declare k as a global variable and adjust it when needed.


Solution

  • According to its documentation, the value for metric should be a callable (or a string for a particular fixed collection). In your case you could obtain that through

    def cust_metric(k):
        return lambda u, v: np.cumsum(np.gcd(u, v) * k)
    

    I do imagine your actual callable would look somewhat different since the moment u and v are 2D arrays, the np.cumsum returns an array, while the callable is supposed to produce a scalar. For example:

    In [25]: arr1 = np.array([[5, 7], [6, 1]])
    
    In [26]: arr2 = np.array([[6, 7], [6, 1]])
    
    In [28]: def cust_metric(k):
        ...:     return lambda u, v: np.sqrt(np.sum((k*u - v)**2))
        ...:
    
    In [29]: scipy.spatial.distance.cdist(arr1, arr2, metric=cust_metric(k=7))
    Out[29]:
    array([[51.03920062, 56.08029957],
           [36.        , 36.49657518]])
    
    In [30]: scipy.spatial.distance.cdist(arr1, arr2, metric=cust_metric(k=1))
    Out[30]:
    array([[1.        , 6.08276253],
           [6.        , 0.        ]])