deep-learningpytorchneural-networktabularmlp

Neural network not learning at all


I am training a MLP on a tabular dataset, the pendigits dataset. Problem is that training loss and accuracy are more or less stable, while validation and test loss and accuracy are completely constant. The pendigits dataset contains 10 classes. My code is exactly the same with other experiments that I did for example on MNIST or CIFAR10 that work correctly. The only things that change are the dataset from MNIST/CIFAR10 to pendigits and the NN, from a ResNet-18 to a simple MLP. Below the training function and the network:

def train(net, loaders, optimizer, criterion, epochs=100, dev=dev, save_param = True, model_name="only-pendigits"):
    torch.manual_seed(myseed)
    try:
        net = net.to(dev)
        print(net)
        # Initialize history
        history_loss = {"train": [], "val": [], "test": []}
        history_accuracy = {"train": [], "val": [], "test": []}
        # Process each epoch
        for epoch in range(epochs):
            # Initialize epoch variables
            sum_loss = {"train": 0, "val": 0, "test": 0}
            sum_accuracy = {"train": 0, "val": 0, "test": 0}
            # Process each split
            for split in ["train", "val", "test"]:
                # Process each batch
                for (input, labels) in loaders[split]:
                    # Move to CUDA
                    input = input.to(dev)
                    labels = labels.to(dev)
                    # Reset gradients
                    optimizer.zero_grad()
                    # Compute output
                    pred = net(input)
                    #labels = labels.long()
                    loss = criterion(pred, labels)
                    # Update loss
                    sum_loss[split] += loss.item()
                    # Check parameter update
                    if split == "train":
                        # Compute gradients
                        loss.backward()
                        # Optimize
                        optimizer.step()
                    # Compute accuracy
                    _,pred_labels = pred.max(1)
                    batch_accuracy = (pred_labels == labels).sum().item()/input.size(0)
                    # Update accuracy
                    sum_accuracy[split] += batch_accuracy
                scheduler.step()
            # Compute epoch loss/accuracy
            epoch_loss = {split: sum_loss[split]/len(loaders[split]) for split in ["train", "val", "test"]}
            epoch_accuracy = {split: sum_accuracy[split]/len(loaders[split]) for split in ["train", "val", "test"]}
            # Update history
            for split in ["train", "val", "test"]:
                history_loss[split].append(epoch_loss[split])
                history_accuracy[split].append(epoch_accuracy[split])
            # Print info
            print(f"Epoch {epoch+1}:",
                  f"TrL={epoch_loss['train']:.4f},",
                  f"TrA={epoch_accuracy['train']:.4f},",
                  f"VL={epoch_loss['val']:.4f},",
                  f"VA={epoch_accuracy['val']:.4f},",
                  f"TeL={epoch_loss['test']:.4f},",
                  f"TeA={epoch_accuracy['test']:.4f},",
                  f"LR={optimizer.param_groups[0]['lr']:.5f},")
    except KeyboardInterrupt:
        print("Interrupted")
    finally:
        # Plot loss
        plt.title("Loss")
        for split in ["train", "val", "test"]:
            plt.plot(history_loss[split], label=split)
        plt.legend()
        plt.show()
        # Plot accuracy
        plt.title("Accuracy")
        for split in ["train", "val", "test"]:
            plt.plot(history_accuracy[split], label=split)
        plt.legend()
        plt.show()

Network:

#RETE TESTO
class TextNN(nn.Module):

    #Constructor
    def __init__(self):
    # Call parent contructor
        super().__init__()
        torch.manual_seed(myseed)
        self.relu = nn.ReLU()
        self.linear1 = nn.Linear(16, 128) #16 sono le colonne in input
        self.linear2 = nn.Linear(128, 128)
        self.linear3 = nn.Linear(128, 32)
        self.linear4 = nn.Linear(32, 10)
    
    def forward(self, tab):
        tab = self.linear1(tab)
        tab = self.relu(tab)
        tab = self.linear2(tab)
        tab = self.relu(tab)
        tab = self.linear3(tab)
        tab = self.relu(tab)
        tab = self.linear4(tab)

        return tab

model = TextNN()
print(model)

Is it possible that the model is too simple that it does not learn anything? I do not think so. I think that there is some error in training (but the function is exactly the same with the function I use for MNIST or CIFAR10 that works correctly), or in the data loading. Below is how I load the dataset:

pentrain = pd.read_csv("pendigits.tr.csv")
pentest = pd.read_csv("pendigits.te.csv")

class TextDataset(Dataset):
    """Tabular and Image dataset."""

    def __init__(self, excel_file, transform=None):
        self.excel_file = excel_file
        #self.tabular = pd.read_csv(excel_file)
        self.tabular = excel_file
        self.transform = transform

    def __len__(self):
        return len(self.tabular)

    def __getitem__(self, idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()

        tabular = self.tabular.iloc[idx, 0:]

        y = tabular["class"]


        tabular = tabular[['input1', 'input2', 'input3', 'input4', 'input5', 'input6', 'input7',
       'input8', 'input9', 'input10', 'input11', 'input12', 'input13',
       'input14', 'input15', 'input16']]
        tabular = tabular.tolist()
        tabular = torch.FloatTensor(tabular)
        
        if self.transform:
            tabular = self.transform(tabular)

        return tabular, y

penditrain = TextDataset(excel_file=pentrain, transform=None)

train_size = int(0.80 * len(penditrain))
val_size = int((len(penditrain) - train_size))

pentrain, penval = random_split(penditrain, (train_size, val_size))

pentest = TextDataset(excel_file=pentest, transform=None)

All is loaded correctly, indeed if I print an example:

text_x, label_x = pentrain[0]
print(text_x.shape, label_x)
text_x

torch.Size([16]) 1
tensor([ 48.,  74.,  88.,  95., 100., 100.,  78.,  75.,  66.,  49.,  64.,  23.,
         32.,   0.,   0.,   1.])

And these are my dataloaders:

#Define generators
generator=torch.Generator()
generator.manual_seed(myseed)

# Define loaders
from torch.utils.data import DataLoader
train_loader = DataLoader(pentrain, batch_size=128, num_workers=2, drop_last=True, shuffle=True, generator=generator)
val_loader   = DataLoader(penval,   batch_size=128, num_workers=2, drop_last=False, shuffle=False, generator=generator)
test_loader  = DataLoader(pentest,  batch_size=128, num_workers=2, drop_last=False, shuffle=False, generator=generator)

I have been stuck with this problem for 2 days, and I do not know what the problem is...

EDIT: Basically, if I write print(list(net.parameters())) at the beginning of each epoch, I see that weights does never change, and for this reason loss and accuracy remain constant. Why weights are not changing?

EDIT2: also with another dataset, like digits of sklearn, the problem is exactly the same.

EDIT3: I see online that simple MLP like the one I am using, obtains good results on these datasets. I compared my training function with online notebooks, and the steps are the same. Moreover, my training function works on other datasets like MNIST. So I do not know where is the problem...


Solution

  • I solved... mistake was that I was calling again model = TextNN() after instantiating the optimizer, so weights were not changing... So, every part was ok, apart from the optimizer that was working with another (unused) model.