My use case is to retrain/make predictions using VW CB in batch mode (retrain/inference occurs nightly).
I'm reading this tutorial for offline policy evaluation in the batch scenario. I'm training on a logged dataset using:
--cb_adf --save_resume -f {MODEL_PATH} -d ./data/train.txt
and in order to tune hyperparameter epsilon
on batch predictions, I run the following commands 3 times on a separate dataset using:
-i {MODEL_PATH} -t --cb_explore_adf --epsilon 0.1/0.2/0.3 -d ./data/eval.txt
whichever gives the lowest average loss is the optimal epsilon.
Am I using the right options? My confusion mostly comes from the another option --explore_eval
. What is the difference between --explore_eval
and cb_explore_adf
and what is the right way to evaluate model+exploration offline? Should I just run
--explore_eval --epsilon 0.1/0.2/0.3 -d ./data/train+eval.txt
and whichever gives the lowest average loss is the optimal epsilon.
-i {MODEL_PATH} -t --cb_explore_adf --epsilon 0.1/0.2/0.3 -d ./data/eval.txt
I predict the result of this experiment: the optimal epsilon is the smallest. This is because after data has been collected, there is no value to exploration. In order to assess exploration, you have to change the data available at training in a manner sensitive to the exploration algorithm. Which brings us to ...
--explore_eval --epsilon 0.1/0.2/0.3 -d ./data/train+eval.txt
'--explore_eval' is designed to assess exploration. It requires more data to work well (since it discards the data if the exploration doesn't match) but allows you to assess exploration since it simulates the fog of war.
If you are testing other model hyperparameters such as base learning algorithm or interactions, the extra data overhead of '--explore_eval' is unnecessary.