python optimization pytorch conv-neural-network

My train accuracy remains at 10% when I add weight_decay parameter to my optimizer in PyTorch. I am using CIFAR10 dataset and LeNet CNN model

I am training CIFAR10 dataset on LeNet CNN model. I am using PyTorch on Google Colab. The code runs only when I use Adam optimizer with model.parameters() as the only parameter. But when I change my optimizer or use weight_decay parameter then the accuracy remains at 10% through all the epochs. I cannot understand the reason why it is happening.

# CNN Model - LeNet    
class LeNet_ReLU(nn.Module):
    def __init__(self):
        super().__init__()
        self.cnn_model = nn.Sequential(nn.Conv2d(3,6,5), 
                                       nn.ReLU(),
                                       nn.AvgPool2d(2, stride=2), 
                                       nn.Conv2d(6,16,5), 
                                       nn.ReLU(),
                                       nn.AvgPool2d(2, stride=2))  
        self.fc_model = nn.Sequential(nn.Linear(400, 120),   
                                      nn.ReLU(),
                                      nn.Linear(120,84),  
                                      nn.ReLU(),
                                      nn.Linear(84,10))

    def forward(self, x):
        x = self.cnn_model(x)
        x = x.view(x.size(0), -1)
        x = self.fc_model(x)
        return x

# Importing dataset and creating dataloader
batch_size = 128
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True,
                                    transform=transforms.ToTensor())
trainloader = utils_data.DataLoader(trainset, batch_size=batch_size, shuffle=True)
testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True,
                                    transform=transforms.ToTensor())
testloader = utils_data.DataLoader(testset, batch_size=batch_size, shuffle=False)

# Creating instance of the model
net = LeNet_ReLU()

# Evaluation function
def evaluation(dataloader):
    total, correct = 0, 0
    for data in dataloader:
        inputs, labels = data

        outputs = net(inputs)
        _, pred = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (pred==labels).sum().item()
    return correct/total * 100

# Loss function and optimizer
loss_fn = nn.CrossEntropyLoss()
opt = optim.Adam(net.parameters(), weight_decay = 0.9)

# Model training
loss_epoch_arr = []
max_epochs = 16

for epoch in range(max_epochs):
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data

        outputs = net(inputs)
        loss = loss_fn(outputs, labels)
        loss.backward()
        opt.step()

        opt.zero_grad()


    loss_epoch_arr.append(loss.item())

    print('Epoch: %d/%d, Test acc: %0.2f, Train acc: %0.2f'
    % (epoch,max_epochs, evaluation(testloader), evaluation(trainloader))) 

plt.plot(loss_epoch_arr)

Solution

The weight decay mechanism sets a penalty for high value weights, i.e. it stricts the weights to have relatively small values by adding their sum multiplied by the weight_decay argument you gave it. That can be seen as a quadratic regularization term.

When passing large weight_decay value, you may strict your network too much and prevent it from learning, that's probably the reason it had 10% of accuracy which is related to non-learning at all and just guessing the answer (since you have 10 classes you receive 10% of acc, when the output isn't a function of your input at all).

The solution would be to play around with different values, train for weight_decay of 1e-4 or some other values in that area. Note that when you reach values closer to zero you should have results which are closer to your initial train without using the weight decay.

Hope that helps.