vowpalwabbit

Vowpal Wabbit: specifying the learning rate schedule


I'm looking at VW's docs for update rule options, and I'm confused about the equation that specifies the learning rate schedule using the parameters initial_t, power_t, and decay_learning_rate.

Based on the equation below this line in the docs

specify the learning rate schedule whose generic form

if initial_t is equal to zero (which is the setting by default), it seems that the learning rate will always be zero, for all timesteps and epochs. Is this right?

Also, what would happen if both initial_t and power_t are set to zero? I tried initializing a VW with those settings and it didn't complain.


Solution

  • if initial_t is equal to zero (which is the setting by default), it seems that the learning rate will always be zero, for all timesteps and epochs. Is this right?

    initial_t is set to zero by default. By default the initial learning rate will not use initial_t to calculate its value but will start off at its default value, which is 0.5.

    Per the documentation, the flags adaptive, normalized, and invariant are on by default. If any of them is specified, the other flags are turned off. In the case that you turn on the invariant flag (so in the case that we are not using normalized or adaptive) the initial learning rate will be calculated using the initial_t and power_t values, and the default initial_t is set to one instead of zero.

    If initial_t is explicitly set to zero combined with the invariant flag being set, then yes, the learning rate will also be zero.

    Also, what would happen if both initial_t and power_t are set to zero? I tried initializing a VW with those settings and it didn't complain.

    If the initial learning rate is calculated using initial_t and power_t and both are explicitly set to zero, c++ should evaluate powf(0,0) to 1 resulting in the learning rate set to its default value, which can be specified by --learning_rate

    If you are running vowpalwabbit via the command line, you should be able to see what these values are set to:

    Num weight bits = 18
    learning rate = 10
    initial_t = 1
    power_t = 0.5
    decay_learning_rate = 1