machine-learningdeep-learningpytorchtransfer-learningtorchvision

(Pytorch) mat1 and mat2 shapes cannot be multiplied (212992x13 and 1280x3)


I am trying to do transfer learning on Pytorch pretrained models with custom dataset. Presently, I am getting an error as mat1 and mat2 shapes cannot be multiplied (212992x13 and 1280x3) during training the custom model.

When I try using efficient net, the below code works and it trains successfully, but when I use models like squeeze net I get an error

Works:

weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT
model = torchvision.models.efficientnet_b0(weights=weights).to(device)

Does not work:

weights = torchvision.models.SqueezeNet1_0_Weights.DEFAULT
model = torchvision.models.squeezenet1_0(weights=weights).to(device)

Train:

auto_transforms = weights.transforms()
train_dataloader, test_dataloader, class_names = data_setup.create_dataloaders(train_dir=train_dir, test_dir=test_dir, transform=auto_transforms, batch_size=32)

for param in model.features.parameters():
    param.requires_grad = False #Freeze layers

torch.manual_seed(42)
output_shape = len(class_names)
model.classifier = torch.nn.Sequential(
    torch.nn.Dropout(p=0.2, inplace=True),
    torch.nn.Linear(in_features=1280,
                    out_features=output_shape,
                    bias=True)).to(device)

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

#ERROR DURING TRAIN
results = engine.train(model=model, train_dataloader=train_dataloader, test_dataloader=test_dataloader, optimizer=optimizer, loss_fn=loss_fn, epochs=100, device=device)

The training image size is 512x512

To make sure that this is not a problem of transforms, I have used autotransforms but still the problem persists.

Although there exists a similar topic mat1 and mat2 shapes cannot be multiplied (128x4 and 128x64), it is based completely on creating a new Sequential model, whereas I am trying to use transfer learning on pretrained model.


Solution

  • If you replace your classifier with the identity function, you will see what the problem is:

    model.classifier = nn.Identity()
    model(torch.rand(2,3,512,512)).shape
    torch.Size([2, 492032])
    

    The in_features of your classifier linear layer should be 492032, not 1280. Beside, if you compare with the source code of SqueezeNet, you will see that model.classfier does not contain a linear layer but a convolutional layer followed by a pooling layer, line 81:

    final_conv = nn.Conv2d(512, self.num_classes, kernel_size=1)
    self.classifier = nn.Sequential(
       nn.Dropout(p=dropout), 
       final_conv, 
       nn.ReLU(inplace=True), 
       nn.AdaptiveAvgPool2d((1, 1)))
    

    Considering the kernel size is of shape 1x1, it acts as a linear layer. You could therefore replace your implementation with the following code:

    model.classifier = nn.Sequential(
        nn.Dropout(p=dropout), 
        nn.Conv2d(512, len(class_names), kernel_size=1),
        nn.ReLU(inplace=True), 
        nn.AdaptiveAvgPool2d((1, 1)))