I want to find the cosine similarity (or euclidean distance if easier) between one query row, and 10 other rows. These rows are full of nan values, so if a column is nan they are to be ignored.
For example, query :
A B C D E F
3 2 NaN 5 NaN 4
df =
A B C D E F
2 1 3 NaN 4 5
1 NaN 2 4 NaN 3
. . . . . .
. . . . . .
So I just want to get the cosine similarity between every non null column that query and the rows from df have in column. So for row 0 in df A, B, and F are non null in both query and df.
I then want to print the cosine similarity for each row.
Thanks in advance
The simplest method I can think of is to use sklearn's cosine_similarity
.
from sklearn.metrics.pairwise import cosine_similarity
cosine_similarity(df.fillna(0), df1.fillna(0))
# array([[0.51378309],
# [0.86958199]])
The easiest way to "ignore" NaNs is to just treat them as zeros when computing similarity.