I am trying to find if a given word/ set of words are similar to a definition.
Example - Definition - "vegetarian User"
Now, if I want to check a set of sentences like below
sentences = ['vegetarian User',
'user sometimes eats chicken',
'user is vegetarian',
'user only eats fruits',
'user likes fish']
I tried using some sentence transformer like below
model = SentenceTransformer("all-mpnet-base-v2")
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings,embeddings)
print(similarities)
But this is not giving me expected results.
What is the best approach to achieve results like below?
[False,True,True,False]
Is it doable with nlp/ some other technique?
Yes, it’s definitely doable using NLP! The key here is that you don’t need a full similarity matrix; you want to check if each sentence is semantically similar to the given definition.
✅ Better Approach: Encode both the definition and sentences using a sentence transformer. Compute cosine similarity between the definition embedding and each sentence embedding. Set a threshold (e.g., 0.6 or 0.7) to determine if they are "similar enough."
from sentence_transformers import SentenceTransformer, util
# Load the pre-trained model
model = SentenceTransformer("all-mpnet-base-v2")
# Definition and sentences
definition = "vegetarian User"
sentences = [
'vegetarian User',
'user sometimes eats chicken',
'user is vegetarian',
'user only eats fruits',
'user likes fish'
]
# Encode the definition and sentences
definition_embedding = model.encode(definition, convert_to_tensor=True)
sentence_embeddings = model.encode(sentences, convert_to_tensor=True)
# Compute cosine similarities
similarities = util.cos_sim(definition_embedding, sentence_embeddings)[0]
# Set a threshold for similarity (tune this value as needed)
threshold = 0.6
results = [sim >= threshold for sim in similarities]
# Print results
print(results) # Example output: [True, False, True, False, False]
💡 Explanation: util.cos_sim computes the cosine similarity between the definition and each sentence. Threshold tuning: If the similarity is above the threshold, consider it True. Adjust the threshold based on how strict you want the matching.
🔍 Why the original approach didn’t work: model.similarity doesn’t exist in the SentenceTransformers API. You were computing a sentence-to-sentence matrix, not definition-to-sentence comparisons.