I am ussing text2vec on 230k docs, as I always mention. I am trying to find the best topic number for my document term matrix by using perplexity. When I use it one by one it works perfectly fine, but when I try to use a loop to get it for a range from 2 to 25 it doesn't work and I can't tell why, could someone please tell me what is wrong?
##Using perplexity for hold out set
t1 <- Sys.time()
perplex <- c()
for (i in 2:25){
set.seed(17)
lda_model <- LDA$new(n_topics = i)
doc_topic_distr <- lda_model$fit_transform(x = dtm, progressbar = F)
perplex[i] <- text2vec::perplexity(sample.dtm, topic_word_distribution =
lda_model$topic_word_distribution, doc_topic_distribution = new_doc_topic_distr)
}
print(difftime(Sys.time(), t1, units = 'sec'))
INFO [2019-10-23 13:01:43] early stopping at 80 iteration
INFO [2019-10-23 13:01:45] early stopping at 20 iteration
INFO [2019-10-23 13:01:53] early stopping at 70 iteration
INFO [2019-10-23 13:01:55] early stopping at 20 iteration
Error in text2vec::perplexity(sample.dtm, topic_word_distribution = lda_model$topic_word_distribution, :
nrow(topic_word_distribution) == ncol(doc_topic_distribution) is not TRUE
This is because you need to re-calculate new_doc_topic_distr
inside the loop