[SOLVED] Inspect all probabilities of BERTopic model

Inspect all probabilities of BERTopic model

Say I build a BERTopic model using

from bertopic import BERTopic
topic_model = BERTopic(n_gram_range=(1, 1), nr_topics=20)
topics, probs = topic_model.fit_transform(docs)

Inspecting probs gives me just a single value for each item in docs.

probs
array([0.51914467, 0.        , 0.        , ..., 1.        , 1.        ,
       1.        ])

I would like the entire probability vector across all topics (so in this case, where nr_topics=20, I want a vector of 20 probabilities for each item in docs). In other words, if I have N items in docs and K topics, I would like an NxK output.

Solution

For individual topic probability across each document you need to add one more argument.

topic_model = BERTopic(n_gram_range=(1, 1), nr_topics=20, calculate_probabilities=True)

Note: This calculate_probabilities = True will only work if you are using HDBSCAN clustering embedding model. And Bertopic by default uses all-MiniLM-L6-v2.

Official documentation: https://maartengr.github.io/BERTopic/api/bertopic.html

They have mentioned the same in document as well.