I have a very basic question. I'd like to do low-rank matrix factorization and I was looking at the Vowpal Wabbit documentation on the topic. My question is:
Is there a difference between these two approaches? (implementation or otherwise)
$ vw --lrq ab5
or
$ vw -q ab --rank 5
Here, a
and b
are feature namespaces and 5
is the latent-factor dimensionality.
Possible follow-up:
If these are equivalent, will --rank
also work for higher-order interactions?
--rank
and --lrq
are two separate and very different, implementations of matrix factorization/decomposition in vowpal wabbit.
"Matrix factorization", sometimes referred to as "Matrix decomposition" is a general term in ML, there are many ways to approximate a matrix using simpler factors (sometimes with loss of info).
Although they bear some similarities in that they both try to capture the strongest latent interactions between two feature subsets, they are not equivalent neither in implementation nor in the quality of the model they produce. Their performance strongly depends on the problem at hand.
--rank
was the first implementation of MF in vw
by Jake Hofman. It was inspired by SVD (Singular Value Decomposition)--lrq
was implemented a few years later by Paul Mineiro. It was inspired by libfmOn data-sets which are hard to generalize from (e.g. movielens 1M where user has at most one rating per movie), --lrq
seems to perform better. It seems to be using better defaults, converges faster, is more efficient and generates much smaller on-disk models. --rank
may perform better on other data-sets where there's more examples per user/item to generalize from.
You can tell the two implementation produce different results by running an example. e.g. pick a dataset under the test
directory and run the two algos on it:
vw --lrq aa3 test/train-sets/0080.dat
versus:
vw --rank 3 -q aa test/train-sets/0080.dat
Feel free to add: --holdout_off -c --passes 1000
to make them run longer so you can compare run-times between the two.
You would note that the two use a different number of features per example (--lrq
is more minimalistic and will only look at the subset you explicitly tell it to), that the convergence, and final average loss is better with --lrq
. If you store the model with -f modelname
- you'd note it'll be much smaller with --lrq
especially on big data-sets.
OTOH, if you try a data-set like test/train-sets/ml100k_small_train
in the source tree, with a rank of 10 between the namespaces u
(user) and i
(item), you would get a better loss with --rank
than with --lrq
. This shows that which one is better depends on the data-set at hand.
--cubic
)To your second question: --rank
will not allow higher interactions. If you try to add --cubic
you'll get an error:
vw (gd_mf.cc:139): cannot use triples in matrix factorization
but it will allow multiple/additional -q
(quadratic) interactions.
--lrq
is less fussy, so you may add higher-order interaction options to it.
Generally, --lrq
is more agnostic and independent of other vw
options while --rank
is using its own standalone SGD code and doesn't accept other options (like --normalized
, or --adaptive
). Also, memory requirements for --rank
are higher.
Again, results will depend on the data, additional options, and specific interactions.
--lrq
demo in the source treelibfm
(by Steffen Rendle) after which --lrq
was designed with many further references.