machine-learningvowpalwabbitmatrix-factorization

Vowpal Wabbit: Low-rank matrix factorization?


I have a very basic question. I'd like to do low-rank matrix factorization and I was looking at the Vowpal Wabbit documentation on the topic. My question is:

Is there a difference between these two approaches? (implementation or otherwise)

$ vw --lrq ab5

or

$ vw -q ab --rank 5

Here, a and b are feature namespaces and 5 is the latent-factor dimensionality.


Possible follow-up:

If these are equivalent, will --rank also work for higher-order interactions?


Solution

  • Short answer:

    --rank and --lrq are two separate and very different, implementations of matrix factorization/decomposition in vowpal wabbit.

    "Matrix factorization", sometimes referred to as "Matrix decomposition" is a general term in ML, there are many ways to approximate a matrix using simpler factors (sometimes with loss of info).

    Although they bear some similarities in that they both try to capture the strongest latent interactions between two feature subsets, they are not equivalent neither in implementation nor in the quality of the model they produce. Their performance strongly depends on the problem at hand.

    In more detail:

    On data-sets which are hard to generalize from (e.g. movielens 1M where user has at most one rating per movie), --lrq seems to perform better. It seems to be using better defaults, converges faster, is more efficient and generates much smaller on-disk models. --rank may perform better on other data-sets where there's more examples per user/item to generalize from.

    You can tell the two implementation produce different results by running an example. e.g. pick a dataset under the test directory and run the two algos on it:

    vw --lrq aa3       test/train-sets/0080.dat
    

    versus:

    vw --rank 3 -q aa  test/train-sets/0080.dat
    

    Feel free to add: --holdout_off -c --passes 1000 to make them run longer so you can compare run-times between the two.

    You would note that the two use a different number of features per example (--lrq is more minimalistic and will only look at the subset you explicitly tell it to), that the convergence, and final average loss is better with --lrq. If you store the model with -f modelname - you'd note it'll be much smaller with --lrq especially on big data-sets.

    OTOH, if you try a data-set like test/train-sets/ml100k_small_train in the source tree, with a rank of 10 between the namespaces u (user) and i (item), you would get a better loss with --rank than with --lrq. This shows that which one is better depends on the data-set at hand.

    Higher-exponent interactions (e.g. --cubic)

    To your second question: --rank will not allow higher interactions. If you try to add --cubic you'll get an error:

    vw (gd_mf.cc:139): cannot use triples in matrix factorization
    

    but it will allow multiple/additional -q (quadratic) interactions.

    --lrq is less fussy, so you may add higher-order interaction options to it.

    More differences:

    Generally, --lrq is more agnostic and independent of other vw options while --rank is using its own standalone SGD code and doesn't accept other options (like --normalized, or --adaptive). Also, memory requirements for --rank are higher.

    Again, results will depend on the data, additional options, and specific interactions.

    Further reading

    --rank

    --lrq