pythondeep-learningpytorchconv-neural-networknas

How to fix the input dimension from convolution flatten to feed forward layer?


I am using nni framework on python to do Neural Architecture Search. In that I have defined model as:

from nni.nas.pytorch import mutables

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = mutables.LayerChoice([
            nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1),
            nn.Conv2d(3, 32, kernel_size=5, stride=1, padding=1)
        ])  # try 3x3 kernel and 5x5 kernel
        self.conv2 = nn.Conv2d(32, 64, 3, 1)
        self.dropout1 = nn.Dropout2d(0.25)
        self.dropout2 = nn.Dropout2d(0.5)
        self.fc1 = nn.Linear(14400, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = F.max_pool2d(x, 2)
        x = self.dropout1(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x) #Here is error coming
        x = F.relu(x)
        x = self.dropout2(x)
        x = self.fc2(x)
        output = F.log_softmax(x, dim=1)
        return output

What the above code does apart from building the model is it also gives a choice to below algorithm to choose between two layers as the first convolution layer, either layer with 3X3 kernel or 5X5 kernel.

Also I am new to pyTorch so let me know if you can already see a mistake in above.

Moving on, it is coupled by below code:

dataset_train = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
dataset_valid = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), 0.05, momentum=0.9, weight_decay=1.0E-4)

# use NAS here
def top1_accuracy(output, target):
    # this is the function that computes the reward, as required by ENAS algorithm
    batch_size = target.size(0)
    _, predicted = torch.max(output.data, 1)
    return (predicted == target).sum().item() / batch_size

def metrics_fn(output, target):
    # metrics function receives output and target and computes a dict of metrics
    return {"acc1": top1_accuracy(output, target)}

from nni.algorithms.nas.pytorch import enas
trainer = enas.EnasTrainer(model,
                           loss=criterion,
                           metrics=metrics_fn,
                           reward_function=top1_accuracy,
                           optimizer=optimizer,
                           batch_size=128,
                           num_epochs=10,  # 10 epochs
                           dataset_train=dataset_train,
                           dataset_valid=dataset_valid,
                           log_frequency=10)  # print log every 10 steps
trainer.train()  # training
trainer.export(file="model_dir/final_architecture.json")  # export the final architecture to file

What the above does is downloads and gets cifar10 dataset, uses the above generated model to train on it and finds which model performs best (based on two choices of layers, you can have more choices as well). But it raises an error:

     22         x = self.dropout1(x)
     23         x = torch.flatten(x, 1)
---> 24         x = self.fc1(x)
     25         x = F.relu(x)
     26         x = self.dropout2(x)

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1128         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1129                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130             return forward_call(*input, **kwargs)
   1131         # Do not call functions when jit is used
   1132         full_backward_hooks, non_full_backward_hooks = [], []

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/linear.py in forward(self, input)
    112 
    113     def forward(self, input: Tensor) -> Tensor:
--> 114         return F.linear(input, self.weight, self.bias)
    115 
    116     def extra_repr(self) -> str:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (128x12544 and 14400x128)

I know this is because the flatten layer converts it to a dimension which is not what the first fully connected layer expects. When I do convert it to what the error says, I get the below error:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (128x14400 and 12544x128)

I believe it happens because of the choice in first convolution layer. My question is how do I fix this? And if nni or something feels not understandable to you, there is the option of just putting the dimensions of fully connected layer as number of hidden units in that layer without mentioning the input in KERAS. But I suppose pyTorch requires input dimension to be correctly put, is there a way I can just say after flatten, to go for a hidden fully connected layer with just the number of units and not the input shape as well which I believe is causing the problems?


Solution

  • For conv with kernel_zise=5 you need to padding=2 and not 1.
    Fix:

            self.conv1 = mutables.LayerChoice([
                nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1),
                nn.Conv2d(3, 32, kernel_size=5, stride=1, padding=1)
            ]) 
    

    to

            self.conv1 = mutables.LayerChoice([
                nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1),
                nn.Conv2d(3, 32, kernel_size=5, stride=1, padding=2)  # match padding size to kernel size
            ])  
    

    Update:
    Recent versions of pytorch allow you to specify padding='same' and avoid the need to come up with the correct value for padding.

    However, I strongly urge you to use the formula for computing the output shape of a convolution layer (found here) and manually compute the correct value for padding. This is a good sanity check to ensure you understand what you are doing.