pythonmachine-learningartificial-intelligencesentence-transformers

Python sentence transformer community detection code getting stuck on line 16


Hi I have been playing with this code from last 3-4 days but no luck. Here is the code

from sentence_transformers import SentenceTransformer, util

model = SentenceTransformer('all-MiniLM-L6-v2')

sentences3 = ['USPS IV USA Tracking Status WTA Portal to content Home ures Plans About Us Glossary Contact Us Home  parceltrack  USPS IV USA Tracking Status ', 
'UPS Mail Innovations USA Tracking Status  to content Home  Contact Us Home  parceltrack  UPS Mail Innovations USA Tracking Status', 
'USPS USA Tracking Status | WTA Portal - WTA Portal Skip to cont About Us Glossary Contact Us Home  parceltrack  USPS USA Tracking Status ']
# remove non ascii characters from list
sentences3 = [x.encode('ascii', 'ignore').decode('ascii') for x in sentences3]
print(sentences3)
embeddings3 = model.encode(sentences3, convert_to_tensor=True)
print(embeddings3)

clusters = util.community_detection(embeddings3, threshold=0.2, min_community_size=1)
print(clusters)

the line clusters = util.community_detection(embeddings3, threshold=0.2, min_community_size=1) is not outputting anything, I have waited for like 1 hour to see anything happens but no luck. I have descent mac pro m2 with 16gb RAM, so I feel resources shouldn't be an issue.

Anyone got any tips to debug? Thanks


Solution

  • This was a known issue in SBERT that was supposed to be fixed. But you can get around it by setting a higher threshold. If you set it to 0.8 it runs in seconds. Or by adding more sentences.

    util.community_detection(embeddings3, threshold=0.8, min_community_size=1)