I like to create separate word clouds for each of my 8 topics in an LDA model. I extracted top 40 words across 8 topics - an object of length 320 containing top words and occurrence probabilities. I am struggling with accessing the terms and probabilities from my top_words_vector object. It is hard to reproduce bc of the tmResult object, but any hint would be much appreciated:
textdata <- base::readRDS(url("https://slcladal.github.io/data/sotu_paragraphs.rda", "rb"))
english_stopwords <- readLines("https://slcladal.github.io/resources/stopwords_en.txt", encoding = "UTF-8")
corpus <- Corpus(DataframeSource(textdata))
processedCorpus <- tm_map(corpus, content_transformer(tolower))
processedCorpus <- tm_map(processedCorpus, removeWords, english_stopwords)
processedCorpus <- tm_map(processedCorpus, removePunctuation, preserve_intra_word_dashes = TRUE)
processedCorpus <- tm_map(processedCorpus, removeNumbers)
processedCorpus <- tm_map(processedCorpus, stemDocument, language = "en")
processedCorpus <- tm_map(processedCorpus, stripWhitespace)
minimumFrequency <- 5
DTM <- DocumentTermMatrix(processedCorpus, control = list(bounds = list(global = c(minimumFrequency, Inf))))
sel_idx <- slam::row_sums(DTM) > 0
DTM <- DTM[sel_idx, ]
textdata <- textdata[sel_idx, ]
K <- 8
set.seed(9161)
# compute the LDA model, inference via 100 iterations of Gibbs sampling
topicModel <- LDA(DTM, K, method="Gibbs", control=list(iter = 100, verbose = 25))
tmResult <- topicmodels::posterior(topicModel)
tmResult$terms
top_words_vector = c() # an empty container for 320 length, top#40 words across 8 topics
for(i in 1:8){
top_words_vector = c(top_words_vector,sort(tmResult$terms[i,], decreasing=TRUE)[1:40])
}
top_words_vector
wordcloud() takes terms and probs separately, that's what I am trying to extract from top_words_vector:
mycolors <- brewer.pal(8, "Dark2")
wordcloud(c("apple", "banana"), c(0.8,0.2), random.order = TRUE, color = mycolors)
names(top_words_vector)
accesses the names of the stored values.
library(tm)
library(topicmodels)
library(RColorBrewer)
library(wordcloud)
mycolors <- brewer.pal(8, "Dark2")
wordcloud(names(top_words_vector), top_words_vector, random.order = TRUE, color = mycolors)