I'm looking at VW's docs for update rule options, and I'm confused about the equation that specifies the learning rate schedule using the parameters
initial_t
,
power_t
,
and decay_learning_rate
.
Based on the equation below this line in the docs
specify the learning rate schedule whose generic form
if initial_t
is equal to zero (which is the setting by default), it seems that the learning rate will always be zero, for all timesteps and epochs. Is this right?
Also, what would happen if both initial_t
and power_t
are set to zero? I tried initializing a VW with those settings and it didn't complain.
if initial_t is equal to zero (which is the setting by default), it seems that the learning rate will always be zero, for all timesteps and epochs. Is this right?
initial_t
is set to zero by default. By default the initial learning rate will not use initial_t
to calculate its value but will start off at its default value, which is 0.5
.
Per the documentation, the flags adaptive
, normalized
, and invariant
are on by default. If any of them is specified, the other flags are turned off. In the case that you turn on the invariant
flag (so in the case that we are not using normalized or adaptive) the initial learning rate will be calculated using the initial_t
and power_t
values, and the default initial_t
is set to one instead of zero.
If initial_t
is explicitly set to zero combined with the invariant
flag being set, then yes, the learning rate will also be zero.
Also, what would happen if both initial_t and power_t are set to zero? I tried initializing a VW with those settings and it didn't complain.
If the initial learning rate is calculated using initial_t
and power_t
and both are explicitly set to zero, c++ should evaluate powf(0,0)
to 1
resulting in the learning rate set to its default value, which can be specified by --learning_rate
If you are running vowpalwabbit via the command line, you should be able to see what these values are set to:
Num weight bits = 18
learning rate = 10
initial_t = 1
power_t = 0.5
decay_learning_rate = 1