Hi I have been playing with this code from last 3-4 days but no luck. Here is the code
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer('all-MiniLM-L6-v2')
sentences3 = ['USPS IV USA Tracking Status WTA Portal to content Home ures Plans About Us Glossary Contact Us Home parceltrack USPS IV USA Tracking Status ',
'UPS Mail Innovations USA Tracking Status to content Home Contact Us Home parceltrack UPS Mail Innovations USA Tracking Status',
'USPS USA Tracking Status | WTA Portal - WTA Portal Skip to cont About Us Glossary Contact Us Home parceltrack USPS USA Tracking Status ']
# remove non ascii characters from list
sentences3 = [x.encode('ascii', 'ignore').decode('ascii') for x in sentences3]
print(sentences3)
embeddings3 = model.encode(sentences3, convert_to_tensor=True)
print(embeddings3)
clusters = util.community_detection(embeddings3, threshold=0.2, min_community_size=1)
print(clusters)
the line clusters = util.community_detection(embeddings3, threshold=0.2, min_community_size=1) is not outputting anything, I have waited for like 1 hour to see anything happens but no luck. I have descent mac pro m2 with 16gb RAM, so I feel resources shouldn't be an issue.
Anyone got any tips to debug? Thanks
This was a known issue in SBERT that was supposed to be fixed. But you can get around it by setting a higher threshold. If you set it to 0.8
it runs in seconds. Or by adding more sentences.
util.community_detection(embeddings3, threshold=0.8, min_community_size=1)