I'm training a Doc2Vec
model using the below code, where tagged_data
is a list of TaggedDocument
instances I set up before:
max_epochs = 40
model = Doc2Vec(alpha=0.025,
min_alpha=0.001)
model.build_vocab(tagged_data)
for epoch in range(max_epochs):
print('iteration {0}'.format(epoch))
model.train(tagged_data,
total_examples=model.corpus_count,
epochs=model.iter)
# decrease the learning rate
model.alpha -= 0.001
# fix the learning rate, no decay
model.min_alpha = model.alpha
model.save("d2v.model")
print("Model Saved")
When I later check the model results, they're not good. What might have gone wrong?
Do not call .train()
multiple times in your own loop that tries to do alpha
arithmetic.
It's unnecessary, and it's error-prone.
Specifically, in the above code, decrementing the original 0.025
alpha by 0.001
forty times results in (0.025 - 40*0.001
) -0.015
final alpha
, which would also have been negative for many of the training epochs. But a negative alpha
learning-rate is nonsensical: it essentially asks the model to nudge its predictions a little bit in the wrong direction, rather than a little bit in the right direction, on every bulk training update. (Further, since model.iter
is by default 5, the above code actually performs 40 * 5
training passes – 200
– which probably isn't the conscious intent. But that will just confuse readers of the code & slow training, not totally sabotage results, like the alpha
mishandling.)
There are other variants of error that are common here, as well. If the alpha
were instead decremented by 0.0001
, the 40 decrements would only reduce the final alpha
to 0.021
– whereas the proper practice for this style of SGD (Stochastic Gradient Descent) with linear learning-rate decay is for the value to end "very close to 0.000
"). If users start tinkering with max_epochs
– it is, after all, a parameter pulled out on top! – but don't also adjust the decrement every time, they are likely to far-undershoot or far-overshoot 0.000
.
So don't use this pattern.
Unfortunately, many bad online examples have copied this anti-pattern from each other, and make serious errors in their own epochs
and alpha
handling. Please don't copy their error, and please let their authors know they're misleading people wherever this problem appears.
The above code can be improved with the much-simpler replacement:
max_epochs = 40
model = Doc2Vec() # of course, if non-default parameters needed, use them here
# most users won't need to change alpha/min_alpha at all
# but many will want to use more than default `epochs=5`
model.build_vocab(tagged_data)
model.train(tagged_data, total_examples=model.corpus_count, epochs=max_epochs)
model.save("d2v.model")
Here, the .train()
method will do exactly the requested number of epochs
, smoothly reducing the internal effective alpha
from its default starting value to near-zero. (It's rare to need to change the starting alpha
, but even if you wanted to, just setting a new non-default value at initial model-creation is enough.)
Also: note that later calls to infer_vector()
will reuse the epochs
specified at the time of model-creation. If nothing is specified, the default epochs=5
will be used - which is often smaller than is best for training or inference. So if you find a larger number of epochs
(such as 10, 20 or more) is better for training, remember to also use at least the same number of epochs
for inference. (.infer_vector()
takes an optional epochs
parameter whihc can override any value set at model-contruction.