pythonpython-3.xnlp

word/ sentence similarities


I am trying to find if a given word/ set of words are similar to a definition.

Example - Definition - "vegetarian User"

Now, if I want to check a set of sentences like below

sentences = ['vegetarian User',
            'user sometimes eats chicken',
            'user is vegetarian',
            'user only eats fruits',
            'user likes fish']

I tried using some sentence transformer like below

model = SentenceTransformer("all-mpnet-base-v2")
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings,embeddings)
print(similarities)

But this is not giving me expected results.

What is the best approach to achieve results like below?

[False,True,True,False]

Is it doable with nlp/ some other technique?


Solution

  • Yes, it’s definitely doable using NLP! The key here is that you don’t need a full similarity matrix; you want to check if each sentence is semantically similar to the given definition.

    ✅ Better Approach: Encode both the definition and sentences using a sentence transformer. Compute cosine similarity between the definition embedding and each sentence embedding. Set a threshold (e.g., 0.6 or 0.7) to determine if they are "similar enough."

    from sentence_transformers import SentenceTransformer, util
    # Load the pre-trained model
    model = SentenceTransformer("all-mpnet-base-v2")
    
    # Definition and sentences
    definition = "vegetarian User"
    sentences = [
      'vegetarian User',
      'user sometimes eats chicken',
      'user is vegetarian',
      'user only eats fruits',
      'user likes fish'
    ]
    
    # Encode the definition and sentences
    definition_embedding = model.encode(definition, convert_to_tensor=True)
    sentence_embeddings = model.encode(sentences, convert_to_tensor=True)
    
    # Compute cosine similarities
    similarities = util.cos_sim(definition_embedding, sentence_embeddings)[0]
    
    # Set a threshold for similarity (tune this value as needed)
    threshold = 0.6
    results = [sim >= threshold for sim in similarities]
    
    # Print results
    print(results)  # Example output: [True, False, True, False, False]
    

    💡 Explanation: util.cos_sim computes the cosine similarity between the definition and each sentence. Threshold tuning: If the similarity is above the threshold, consider it True. Adjust the threshold based on how strict you want the matching.

    🔍 Why the original approach didn’t work: model.similarity doesn’t exist in the SentenceTransformers API. You were computing a sentence-to-sentence matrix, not definition-to-sentence comparisons.