pythontensorflowpytorchpermutationdropout

Dropout with permutation in Pytorch


According to Pytorch's documentation on Dropout1d

Randomly zero out entire channels (a channel is a 1D feature map, e.g., the j-th channel of the i-th sample in the batched input is a 1D tensor input [i, j]). Each channel will be zeroed out independently on every forward call with probability p using samples from a Bernoulli distribution.

Then does that mean, with a tensor of shape (batch, channel, time), permute(0, 2, 1) should be used along with F.dropout1d(), so that the dropout will affect the channel dimension?

    x = x.permute(0, 2, 1)   # convert to [batch, channels, time]
    x = F.dropout1d(x, p)
    x = x.permute(0, 2, 1)   # back to [batch, time, channels]

And will this piece of code be equivalent to Tensorflow's SpatialDropout1D?


Solution

  • That's correct, this piece of code will zero out values along the channel dimension, and scale the remaining outputs by inputs by 1/(1-p) so the sum over all inputs remains unchanged on average. That corresponds to the exact same behavior as tensorflow's SpatialDropout1D.

    A code snippet to compare what the outputs look like:

    x = np.array([[[1, 0, 1], [2, 8.2, 2]]])
    
    xtf = tf.convert_to_tensor(x)
    xtf = tf.keras.layers.SpatialDropout1D(0.2)(xtf, training=True)
    print(xtf)
    
    xp = torch.from_numpy(x)
    xp = xp.permute(0, 2, 1)  
    xp = torch.nn.functional.dropout1d(xp, 0.2)
    xp = xp.permute(0, 2, 1)
    print(xp)