bert-language-modeltopic-modelingperplexity

How to calculate perplexity of BERTopic?


Is there a way to calculate the perplexity of BERTopic? I am unable to find any such thing in the BERTopic library and in other places.


Solution

  • I managed to figure it out how to get the log perplexity, and then convert it back

    import numpy as np
    model = BERTopic(top_n_words =15,
                       calculate_probabilities=True)
    
    topics, probs = model.fit_transform(docs) # docs = dataset
    log_perplexity = -1 * np.mean(np.log(np.sum(probs, axis=1)))
    perplexity = np.exp(log_perplexity)