pythonsimilarityhierarchical-clusteringscipy-spatial

How to compute similarities between arrays?


I am trying to compute similarity between two samples. The python functions sklearn.metrics.pairwise.cosine_similarity and scipy.spatial.distance.cosine return results that I am not satisfied with. For example:

Is there a way in python to achieve those expected results ?


Solution

  • I think you are misunderstanding what the function computes. By your description you want to compute the misclassfication error / accuracy. However, the function receives two samples u,v and computes the cosine distance between them. In your first examples:

    tt1 = [1, 16, 4, 21]
    tt2 = [5, 17, 3, 22]
    

    then u=tt1 and v=tt2. The different values of the two arrays are the coordinates in the vector space these samples are in (here a 4 dimensional space) - and not different samples. Refer to function documentation and specifically to the examples at the bottom.

    If each coordinate in these arrays represent a different sample then: