I tried to implement the siamese network for image classification task according to the code below:
class SiameseNetwork(nn.Module):
def __init__(self):
super(SiameseNetwork, self).__init__()
# Setting up the Sequential of CNN Layers
self.cnn = nn.Sequential(
nn.Conv2d(1, 96, kernel_size=11,stride=1),
nn.ReLU(inplace=True),
nn.LocalResponseNorm(5,alpha=0.0001,beta=0.75,k=2),
nn.MaxPool2d(3, stride=2),
nn.Conv2d(96, 256, kernel_size=5,stride=1,padding=2),
nn.ReLU(inplace=True),
nn.LocalResponseNorm(5,alpha=0.0001,beta=0.75,k=2),
nn.MaxPool2d(3, stride=2),
nn.Dropout2d(p=0.3),
nn.Conv2d(256,384 , kernel_size=3,stride=1,padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(384,256 , kernel_size=3,stride=1,padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(3, stride=2),
nn.Dropout2d(p=0.3),
)
# Defining the fully connected layers
self.fc = nn.Sequential(
nn.Linear(30976, 1024),
nn.ReLU(inplace=True),
nn.Dropout2d(p=0.5),
nn.Linear(1024, 128),
nn.ReLU(inplace=True),
nn.Linear(128,2))
def forward_once(self, x):
# Forward pass
output = self.cnn(x)
output = output.view(output.size()[0], -1)
output = self.fc(output)
return output
def forward(self, input1, input2):
# forward pass of input 1
output1 = self.forward_once(input1)
# forward pass of input 2
output2 = self.forward_once(input2)
return output1, output2
I understand most of its, but what does the
output = output.view(output.size()[0], -1)
do?.
Do I really need it when I change the self.cnn
with different networks like resnet or vgg?
It reshapes the output
tensor such that it has the same batch size, but each entry is a flattened vector.
output.size()
returns the tensor shape as a tuple. output.size()[0]
selects the first entry of that tuple which conventionally is the batch size. output.view()
returns a tensor that has the same contents but arranged differently, without creating a copy. output.view(output.size()[0], -1') means that the shape of that tensor(view) matches the batch size in the first dimension, and in the second dimension
-1` indicates that the dimension is chosen automatically to match the vector size.
For example, suppose output
is has 8 elements, each a 50x40x7 tensor. The shape of that tensor would be 8x50x40x7. The result of output.view(output.size()[0], -1)
would have the shape 8x14000.
Generally, this is done ahead of a fully connected layer, as a fully connected layer expects a flat vector as an input, one for each batch element. So, you will have to do this for any network that does not output flat vectors. Resnet and VGG are classification networks so their outputs are flat vectors so this operation would not be needed.