I'm learning a topic model from a set of documents and that's working well. But I'm wondering if any existing system will actually generate new documents from the topics and words in the model.
Ie. say I want a new document of topic 0, will any of Gensim/MALLET/other tools actually produce a new document given some input of my topic choice (or choices)? Or is this a roll-your-own kind of problem?
Say I have two topics:
topic #0: 0.009*river + 0.008*lake + 0.006*island + 0.005*mountain + 0.004*area + 0.004*park + 0.004*antarctic + 0.004*south + 0.004*mountains + 0.004*dam
topic #1: 0.026*relay + 0.026*athletics + 0.025*metres + 0.023*freestyle + 0.022*hurdles + 0.020*ret + 0.017*divisão + 0.017*athletes + 0.016*bundesliga + 0.014*medals
Is there any tool that will take "topic 0: .5, topic 1: .5, length: 7" and nicely produce a document like:
island freestyle river south medals mountains area
or something along those lines? I don't want to duplicate this if it already exists.
Have you read the developer's guide and tutorials on the Mallet website? It outlines how to create a document with a high probability of a certain topic:
StringBuilder topicZeroText = new StringBuilder();
Iterator<IDSorter> iterator = topicSortedWords.get(0).iterator();
int rank = 0;
while (iterator.hasNext() && rank < 5) {
IDSorter idCountPair = iterator.next();
topicZeroText.append(dataAlphabet.lookupObject(idCountPair.getID()) + " ");
rank++;
}
This code creates a new document with high probabiltiy of being topic 0. This code can be easily modified to contain more than one topic and have a certain length.