Suppose that I want to optimize a vector v
so that its norm is equal to 1. To do that, I defined a network with that vector as follows:
class myNetwork(nn.Module):
def __init__(self,initial_vector):
super(myNetwork, self).__init__()
#Define vector according to an initial column vector
self.v = nn.Parameter(initial_vector)
def forward(self,x):
#Normalize vector so that its norm is equal to 1
self.v.data = self.v.data / torch.sqrt(self.v.data.transpose(1,0) @ self.v.data)
#Multiply v times a row vector
out = x @ self.v
return out
Is the use of .data
the best way to update v
? Does it takes into account the normalization during backpropagation?
You could simply use
def forward(self,x):
return x @ self.v / (x**2).sum()
Depending on your loss, or the downstream layers of your network, even skip normalization at all.
def forward(self,x):
return x @ self.v
This should work as long as your loss is invariant to scale, the norm should vary only slightly every step, but it is not strictly stable. If you are giving many steps maybe it is worth adding a term thinyvalue * ((myNetwork.v**2).sum()-1)**2
to your loss, to make sure that the norm of $v$ is attracted to 1.