[SOLVED] Evaluating a BERTopic model based on classification metrics

Evaluating a BERTopic model based on classification metrics

I am unable to find a solution to a problem I have with checking coherence scores for my topic models created using BERTopic. I am new to using these methods for NLP and especially new to using Python. I am currently looking at how to derive Topic coherence scores from my model. However, it might be the case that another classification metric would be more suitable.

Here's my code showing my data setup and to show how I am working with a pre-trained and locally saved model from my drive.

# load libraries 
%%capture
!pip install bertopic
from bertopic import BERTopic

# mount google drive, permit access
from google.colab import drive
drive.mount('/content/drive', force_remount = True)

# import data and define columns needed
import pandas as pd
data = pd.read_csv("/content/drive/MyDrive/BERTopic_test_data.csv")
docs = data["text"]

# load in pre saved model
my_model = BERTopic.load('/content/drive/MyDrive/my_model')

# create the topics using pre-saved model 
topic_model = my_model
topics, _ = topic_model.fit_transform(docs)

To provide some more context, here are the components of the BERT model, as well as the parameters chosen when training my_model

from sentence_transformers import SentenceTransformer
from umap import UMAP
from hdbscan import HDBSCAN
import spacy
from spacy.lang.en.examples import sentences 

# defining model components, as well as parameter tuning
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")
umap_model = UMAP(n_neighbors = 15, n_components = 5, min_dist = 0.05, random_state = 42
hdbscan_model = HDBSCAN(min_cluster_size = 25, min_samples = 10,
                        gen_min_span_tree = True,
                        prediction_data = True)
vectorizer_model = CountVectorizer(ngram_range=(1, 2), stop_words = stopwords)


# building the model 
my_model = BERTopic(
    umap_model = umap_model,
    hdbscan_model = hdbscan_model,
    embedding_model = embedding_model,
    vectorizer_model = vectorizer_model,
    top_n_words = 10,
    language = 'english',
    verbose = True
)

I have tried this solution found online but I'm met with error message "AttributeError: 'BERTopic' object has no attribute 'id2word"

# import library from gensim  
from gensim.models import CoherenceModel

# instantiate topic coherence model
cm = CoherenceModel(model=topic_model, texts=docs, coherence='c_v')

# get topic coherence score
coherence_bert = cm.get_coherence() 
print(coherence_bert)

Solution

Usually, the performance of an NLP model is assessed through metric of Precision (P), Recall (R) and F1. You basically have 4 types of prediction outcome, but so far you are only interested in two of them : True Positives (TP) and False Positives (FP), basically wether your prediction equals your expected outcome or not.

P corresponds to the number of TP/TP+FP that means, out of all predicted labels, how much you have correctly classified.
R corresponds to the number of TP/(number of class instances in the dataset)
F1 is the hamornic mean between those two metrics and gives you an overall view of its performance. The higher, the better.

You can easily obtain those metrics using the scikit-learn library if you build two lists to compare both of them :

# Library required
from sklearn.metrics import precision_recall_fscore_support

# List holding true (expected) outcomes
ny_true = []

# List holding predicted outcomes
ny_pred = []

print(precision_recall_fscore_support(ny_true, ny_pred, average='macro'))