ldahyperparametersmallet

How does the number of Gibbs sampling iterations impacts Latent Dirichlet Allocation?


The documentation of MALLET mentions following:

--num-iterations [NUMBER]

The number of sampling iterations should be a trade off between the time taken to complete sampling and the quality of the topic model.

MALLET provides furthermore an example:

// Run the model for 50 iterations and stop (this is for testing only, 
//  for real applications, use 1000 to 2000 iterations)
model.setNumIterations(50);

It is obvious that too few iterations lead to bad topic models.

However, does increasing the number of Gibbs sampling iterations necessarily benefit the quality of the topic model (measured by perplexity, topic coherence or on a downstream task)? Or is it possible that the model quality decreases with the --num-iterations set to a too high value?

On a personal project, averaged over 10-fold cross-validation increasing the number of iterations from 100 to 1000 did not impact the average accuracy (measured as Mean Reciprocal Rank) for a downstream task. However, within the cross-validation splits the performance changed significantly, although the random seed was fixed and all other parameters kept the same. What part of background knowledge about Gibbs sampling am I missing to explain this behavior?

I am using a symmetric prior for alpha and beta without hyperparameter optimization and the parallelized LDA implementation provided by MALLET.


Solution

  • The 1000 iteration setting is designed to be a safe number for most collection sizes, and also to communicate "this is a large, round number, so don't think it's very precise". It's likely that smaller numbers will be fine. I once ran a model for 1000000 iterations, and fully half the token assignments never changed from the 1000 iteration model.

    Could you be more specific about the cross validation results? Was it that different folds had different MRRs, which were individually stable over iteration counts? Or that individual fold MRRs varied by iteration count, but they balanced out in the overall mean? It's not unusual for different folds to have different "difficulty". Fixing the random seed also wouldn't make a difference if the data is different.