Here is part of get_updates
code from SGD
from keras
(source)
moments = [K.zeros(shape) for shape in shapes]
self.weights = [self.iterations] + moments
for p, g, m in zip(params, grads, moments):
v = self.momentum * m - lr * g # velocity
self.updates.append(K.update(m, v))
Since moments
variable is a list of zeros tensors. Each m
in the for loop
is a zero tensor with the shape of p
. Then the self.momentum * m
, at the first line of the loop, is just a scalar multiply by zero tensor which result a zero tensor.
What am I missing here?
Yes - during a first iteration of this loop m
is equal to 0. But then it's updated by a current v
value in this line:
self.updates.append(K.update(m, v))
So in next iteration you'll have:
v = self.momentum * old_velocity - lr * g # velocity
where old_velocity
is a previous value of v
.