I'm trying to understand the difference between the vector.similarity.cosine
Cypher function and the gds.similarity.cosine
function in Neo4j. According to the Neo4j documentation, both are used to compute cosine similarity, but I’m getting different results from them.
For example, given the following vectors:
When I use vector.similarity.cosine(A, B)
, I get result 0.941, but using gds.similarity.cosine(A, B)
should give 0.882. The equation cosine similarity (with numpy) calculation gives 0.882.
Why are these values different? Is there a difference in normalization, implementation details, or expected input formats between the two functions?
Any insights would be appreciated.
The Cypher manual documents (in the "Learn more about the cosine similarity function" dropdown) that Neo4j's vector index uses a normalized cosine similarity function that maps values to the range [0, 1] rather than the traditional [-1, 1].
While the normalized and traditional calculations are equally valid for comparing similarity, you must avoid mixing their results in the same context or comparison.