pytorchlstmcosine-similaritysiamese-networkpairwise-distance

Set the range of pairwise distance and cosine similarity between 0 and 1


I write a BiLSTM-Siamese Network to measure the string similarities using pairwise distance and cosine similarities with the detail as follows:

class SiameseNetwork(nn.Module):
    def __init__(self, num_layers, dropout, weight_matrix, vocabs, similarity_measure):
        super(SiameseNetwork, self).__init__()        
        self.lstm_network = BiLSTM(num_layers, weight_matrix, vocabs)
        self.fc_drop = nn.Dropout(p = dropout)
        self.similarity_measure = similarity_measure
        if self.similarity_measure == 'euclidean_distance':
            self.sm = nn.PairwiseDistance(p=2)
        else:
            self.sm = nn.functional.cosine_similarity
        
    def forward(self, input1, input2):
        output1 = self.lstm_network(input1)
        output2 = self.lstm_network(input2)
        
        out1 = self.fc_drop(output1)
        out2 = self.fc_drop(output2)
        
        x = self.sm(out1, out2)
        if self.similarity_measure == 'euclidean_distance':
            x = 1-x  # The larger the x value is, the more similar the strings are.      
        x = torch.sigmoid(x)

        return x

I used the torch.sigmoid to make the similarity degree between 0 and 1. However, the sigmoid makes the same string pair’s similarities, not 1. Hence, I need to know how to make the range of the similarity degree in the range 0-1 using the pairwise distance and cosine similarity. 0 if the string pairs are dissimilar and 1 if the string pairs are similar. Any help would be greatly appreciated. Thank you!


Solution

  • There are many way to solve your problems. For the cosine similariry case, the output should already been in range [-1,1], now you can choose to clip all the values that is smaller than 0 to be 0 (recommended), like:

    x = torch.clamp(x, 0, 1)
    

    or scale them to be in range [0,1] (not rcommended):

    x = (x + 1)/2
    

    For the case of euclidian distance, your approach is right. If you want the threshold "harder", consider:

    x = torch.sigmoid(alpha * x) ## with alpha > 1 make the result more aggresive toward 0 and 1
    

    Or because x being the distance, which make x >= 0 in all case, you can use any exponential function to calculate similarity, for example:

    x = self.sm(out1, out2)
    x = torch.exp( - alpha * x) ## alpha > 0 
    

    A thing to note is that you should not use sigmoid in the cosine similarity case.