pytorchbatch-normalizationdropout

PyTorch training with dropout and/or batch-normalization


A model should be set in the evaluation mode for inference by calling model.eval().
Do we need to also do this during training before getting the model outputs? Like within a training epoch if the network contains one or more dropout and/or batch-normalization layers.

If this is not done then the output of the forward pass in the training epoch might be affected by the randomness in the dropout?

Many example codes do not do this and something along these lines is the common approach:

for t in range(num_epochs):
    # forward pass
    yhat = model(x)
  
    # get the loss
    loss = criterion(yhat , y)
    
    # backward pass, optimizer step
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

For example here is an example code to look at : convolutional_neural_network/main.py

Should this instead be?

for t in range(num_epochs):
    # forward pass
    model.eval() # disable dropout etc
    yhat = model(x)
    
    # get the loss
    loss = criterion(yhat , y)
    
    # backward pass, optimizer step
    model.train()
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Solution

  • TLDR:

    Should this instead be?

    No!

    Why?

    More explanation:
    Different Modules behave differently depending on whether they are in training or evaluation/test mode.
    BatchNorm and Dropout are only two examples of such modules, basically any module that has a training phase follows this rule.
    When you do .eval(), you are signaling all modules in the model to shift operations accordingly.

    Update
    The answer is during training you should not use eval mode and yes, as long as you have not set the eval mode, the dropout will be active and act randomly in each forward passes. Similarly all other modules that have two phases, will perform accordingly. That is BN will always update the mean/var for each pass, and also if you use batch_size of 1, it will error out as it can not do BN with batch of 1

    As it was pointed out in comments, it should be noted that during training, you should not do eval() before the forward pass, as it effectively disables all modules that has different phases for train/test mode such as BN and Dropout (basically any module that has updateable/learnable parameters, or impacts network topology like dropout) will be disabled and you will not see them contributing to your network learning. So don't code like that!

    Let me explain a bit what happens during training:
    When you are in training mode, all of your modules that make up your model may have two modes, training and test mode. These modules either have learnable parameters that need to be updated during training, like BN, or affect network topology in a sense like Dropout (by disabling some features during forward pass). some modules such as ReLU() only operate in one mode and thus do not have any change when modes change.
    When you are in training mode, you feed an image, it passes trough layers until it faces a dropout and here, some features are disabled, thus theri responses to the next layer is omitted, the output goes to other layers until it reaches the end of the network and you get a prediction.

    the network may have correct or wrong predictions, which will accordingly update the weights. if the answer was right, the features/combinations of features that resulted in the correct answer will be positively affected and vice versa. So during training you do not need and should not disable dropout, as it affects the output and should be affecting it so that the model learns a better set of features.