pythondistance-matrix

Gowers distance (Python Gower package) matrix shows negative values for distance when data has negative values


I am using Gower package in python - https://pypi.org/project/gower/

When I calculate Gowers distance on negative values for Eg Minimum Temperature I get a negative distance matrix

What does negative value indicate? Is this still a normalised distance between 0 and 1 and can I use absolute value of this distance same way I would for a positive value?

Code

import numpy as np
import pandas as pd
import gower

Xd = pd.DataFrame({'mintemp':[-20.0, -15.3, -45.4, -0.5, -45]})
X = np.asarray(Xd)
print(gower.gower_topn(Xd.iloc[0:1,:], Xd, n=5))
print(gower.gower_matrix(X))

Solution

  • Took a look at the source code - I think there was a bug in the calculation of range/max for numerical variables. If the max is negative, range/max will be negative, and so will the distance calculated (as numerical variables are normalized by dividing it by max -> distance = abs((xi-xj)/max) * (max/range).

    So yes, just take the absolute value of the Gower matrix and interpret it the same way you would for positive values.