I have a working logistic regression classifier using the one-against-all (oaa) method. Although I'm currently training the classifier to recognize 15 classes, in the future I would like to feed it examples from N additional classes that I would like my classifier to learn. However, vowpal wabbit commands using the --save_resume
option do not allow me to use --oaa
to specify a new total number of classes.
I use the oaa option because when I make predictions I want to select the top 3 predicted classes that have the highest probability of being true, which I determine using the --probabilities
option.
How can I teach additional classes to my classifier when using --oaa
and --save_resume
?
I initially train my classifier using:
vw --oaa=15 --loss_function=logistic --save_resume -c --passes 10 -d /tmp/train.vw -f /tmp/model.vw
I resume training using:
vw --loss_function=logistic --save_resume -c --passes 10 -d /tmp/train.vw -i /tmp/model.vw -f /tmp/model.v
I make predictions using:
vw -t --probabilities --loss_function=logistic -d /tmp/test.vw -i /tmp/model.vw -p /tmp/predict.vw
I then examine predict.vw
and select the classes with the top 3 highest probabilities of being true.
Currently, it is not possible to increase the number N of classes in --oaa N
when training in multiple steps with --save_resume
. Internally, the model uses N for offsetting the weight vector, so you would need to hack the loading of the model.
You can try setting the N high enough from the beginning and using classes 1-15 in the first steps, and adding classes with higher numbers in the later steps. Thanks to the nature of online training the later examples influence the model more.
Alternatively, with csoaa_ldf you can specify the number of classes on the fly: different classes may be available for each example.