I'm performing an NMF decomposition on a tf-idf input in order to perform topic analysis.
def decomp(tfidfm, topic_count):
model = decomposition.NMF(init="nndsvd", n_components=topic_count, max_iter=500)
H = model.fit_transform(tfidfm)
W = model.components_
return W, H
This returns W, a model definition consisting of topics to term assignments, and H, a document to topic assignment matrix
So far so good, I can use H to classify documents based on their association via term frequency to a list of topics which in turn are also based on their association to term frequency.
I'd like to save the topic-term-associations to disk so I can reapply them later - and have adopted the method described here [https://stackoverflow.com/questions/8955448] to store the sparse-matrix reperesentation of W.
So what I'd like to do now, is perform the same process, only fixing the topic-definition matrix W.
In the documentation, it appears that I can set W in the calling parameters something along the lines of:
def applyModel(tfidfm,W,topic_count):
model = decomposition.NMF(init="nndsvd", n_components=topic_count, max_iter=500)
H = model.fit_transform(X=tfidfm, W=W)
W = model.components_
return W, H
And I've tried this, but it doesn't appear to work.
I've tested by compiling a W matrix using a differently sized vocabulary, then feeding that into the applyModel
function, the shape of the resulting matrices should be defined (or I should say, that is what I'm intending) by the W model, but this isn't the case.
The short version of this question is: How can I save the topic-model generated from a matrix decomposition, such that I can use it to classify a different document set than the one used to originally generate it?
In other terms, if V=WH, then how can I return H, given V and W?
The initial equation is: and we solve it for like this: .
Here denotes the inverse of the matrix , which exists only if is nonsingular.
The multiplication order is, as always, important. If you had , you'd need to multiply by the inverse of the other way round: .