distributioneuclidean-distancemetric

which distance function between two different variables?


What is the best metric to calculate the distance between two objects? The objects are represented by four paremeters (two same variables before and after a procedure).

The goal is to assign data points as pairs among the whole set. Additionally the data points of each pair shall be as close/similar to each other.

The data

Normal and normalized distribution: Look the same just with different magnitudes on the axis.

Feature scaling


Solution

  • You can use any measurement function. Most often, people use the euclidean distance but the optimal one depends heavily on the situation. Otherwise, you can always try all the functions to observe which is the most accurate.

    Something I would recommend doing is scaling your features (if you have more than one). By scaling your features, you can decide how much "impact" each feature has in the euclidean formula.

    If you have a feature that appears to be much more important than another one, you can scale it to a slightly larger number. If all your features are equal weight, you should scale them all down to the same range of numbers, which is often [-0.5, 0.5].