pythonscikit-learntf-idfmatrix-decompositionnmf

Applying a matrix decomposition for classification using a saved W matrix


I'm performing an NMF decomposition on a tf-idf input in order to perform topic analysis.

def decomp(tfidfm, topic_count):
    model = decomposition.NMF(init="nndsvd", n_components=topic_count,     max_iter=500)
    H = model.fit_transform(tfidfm)
    W = model.components_
    return W, H

This returns W, a model definition consisting of topics to term assignments, and H, a document to topic assignment matrix

So far so good, I can use H to classify documents based on their association via term frequency to a list of topics which in turn are also based on their association to term frequency.

I'd like to save the topic-term-associations to disk so I can reapply them later - and have adopted the method described here [https://stackoverflow.com/questions/8955448] to store the sparse-matrix reperesentation of W.

So what I'd like to do now, is perform the same process, only fixing the topic-definition matrix W.

In the documentation, it appears that I can set W in the calling parameters something along the lines of:

def applyModel(tfidfm,W,topic_count):
    model = decomposition.NMF(init="nndsvd", n_components=topic_count, max_iter=500)
    H = model.fit_transform(X=tfidfm, W=W)
    W = model.components_
    return W, H

And I've tried this, but it doesn't appear to work.

I've tested by compiling a W matrix using a differently sized vocabulary, then feeding that into the applyModel function, the shape of the resulting matrices should be defined (or I should say, that is what I'm intending) by the W model, but this isn't the case.

The short version of this question is: How can I save the topic-model generated from a matrix decomposition, such that I can use it to classify a different document set than the one used to originally generate it?

In other terms, if V=WH, then how can I return H, given V and W?


Solution

  • The initial equation is: initial equation and we solve it for H like this: How to solve it for H.

    Here inverse of W denotes the inverse of the matrix W, which exists only if W is nonsingular.

    The multiplication order is, as always, important. If you had if the order is changed, you'd need to multiply V by the inverse of W the other way round: no description.