neural-networkpytorchregressionmlp

PyTorch infinite loop in the training and validation step


Dataset and DataLoader's parts are ok, I recycled from another code that I built, but got an infinite loop at that part in my code:

def train(train_loader, MLP, epoch, criterion, optimizer):

 MLP.train()
 epoch_loss = []

 for batch in train_loader:

    optimizer.zero_grad()
    sample, label = batch

    #Forward
    pred = MLP(sample)
    loss = criterion(pred, label)
    epoch_loss.append(loss.data)

    #Backward
    loss.backward()
    optimizer.step()

 epoch_loss = np.asarray(epoch_loss)

 print('Epoch: {}, Loss: {:.4f} +/- {:.4f}'.format(epoch+1, 
 epoch_loss.mean(), epoch_loss.std()))



def test(test_loader, MLP, epoch, criterion):

 MLP.eval()
 with torch.no_grad():
    epoch_loss = []

    for batch in train_loader:

        sample, label = batch

        #Forward
        pred = MLP(sample)
        loss = criterion(pred, label)
        epoch_loss.append(loss.data)

    epoch_loss = np.asarray(epoch_loss)

    print('Epoch: {}, Loss: {:.4f} +/- {:.4f}'.format(epoch+1, 
    epoch_loss.mean(), epoch_loss.std()))

Than, I put it to iterate over the epochs:

for epoch in range(args['num_epochs']):
    train(train_loader, MLP, epoch, criterion, optimizer)
    test(test_loader, MLP, epoch, criterion)
    print('-----------------------')

As it doesn't print even the first loss data, I believe that the logic error is in the training function, but I don't know where it is.

Edit: Here is my MLP Class, the problem can be here too:

class BikeRegressor(nn.Module):

 def __init__(self, input_size, hidden_size, out_size):
    super(BikeRegressor, self).__init__()
    
    self.features = nn.Sequential(nn.Linear(input_size, hidden_size),
                                  nn.ReLU(),
                                  nn.Linear(hidden_size, hidden_size),
                                  nn.ReLU())
    
    self.out = nn.Sequential(nn.Linear(hidden_size, out_size),
                             nn.ReLU())
    
 def forward(self, X):
    
    hidden = self.features(X)
    output = self.out(hidden)
    
    return output

Edit 2: Dataset and Dataloader:

class Bikes(Dataset):
 def __init__(self, data): #data is a Dataframe from Pandas
    self.datas = data.to_numpy()
    
 def __getitem__(self, idx): 
    sample = self.datas[idx][2:14] 
    label = self.datas[idx][-1:] 
    
    
    sample = torch.from_numpy(sample.astype(np.float32))
    label = torch.from_numpy(label.astype(np.float32))
    
    return sample, label

 def __len__(self):
    return len(self.datas)



train_set = Bikes(ds_train)
test_set = Bikes(ds_test)



train_loader = DataLoader(train_set, batch_size=args['batch_size'], shuffle=True, num_workers=args['num_workers'])
test_loader = DataLoader(test_set, batch_size=args['batch_size'], shuffle=True, num_workers=args['num_workers'])

Solution

  • I experienced the same problem and the problem is that jupyter notebook might not work properly with multiprocessing as documented here:

    Note Functionality within this package requires that the __ main __ module be importable by the children. This is covered in Programming guidelines however it is worth pointing out here. This means that some examples, such as the Pool examples will not work in the interactive interpreter.

    You have three options to solve your problem: