pythonpytorchclassificationvgg-netcross-entropy

RuntimeError: weight tensor should be defined either for all 1000 classes or no classes but got weight tensor of shape: [5]


I'm trying to use VGG16 for ** 5 classes data set**. I've already added 5 new layers to adjust the output for logit as 5.

model = models.vgg16(pretrained=True) #Downloads the vgg16 model which is pretrained on Imagenet dataset.

#Replace the Final layer of pretrained vgg16 with 5 new layers.
model.fc = nn.Sequential(nn.Linear(1000,512),
                         nn.ReLU(inplace=True),
                         nn.Linear(512,256),
                         nn.ReLU(inplace=True),
                         nn.Linear(256,128),
                         nn.ReLU(inplace=True),
                         nn.Linear(128,64),
                         nn.ReLU(inplace=True),
                         nn.Linear(64,5))

And my loss function is as follows:

loss_fn = nn.CrossEntropyLoss(weight=class_weights) #CrossEntropyLoss with class_weights.

where class_weights is defined as such:

from sklearn.utils import class_weight #For calculating weights for each class.
class_weights = class_weight.compute_class_weight(class_weight='balanced',classes=np.array([0,1,2,3,4]),y=train_df['level'].values)
class_weights = torch.tensor(class_weights,dtype=torch.float).to(device)
 
print(class_weights) #Prints the calculated weights for the classes.

output: tensor([0.2556, 4.6000, 1.5333, 9.2000, 9.2000], device='cuda:0')

After first epoch I get the error given below:

RuntimeError: weight tensor should be defined either for all 1000 classes or no classes but got weight tensor of shape: [5]


Solution

  • I faced the same problem as you. I started by changing the size of my final classifier layer (I copied the code from here):

    model = models.mobilenet_v2(pretrained=True)
    
    last_item_index = len(model.classifier) - 1
    old_fc = model.classifier.__getitem__(last_item_index )
    new_fc = nn.Linear(in_features=old_fc.in_features, 
                       out_features= 129, bias=True)
    model.classifier.__setitem__(last_item_index , new_fc)
    

    After changing this, I printed the model architecture using the following code:

    from torchsummary import summary
    summary(model, (3, 224, 224))
    

    And it's working (number of classes in my dataset is 129):

    (classifier): Sequential(
      (0): Dropout(p=0.2, inplace=False)
      (1): Linear(in_features=1280, out_features=129, bias=True)
    )