According to Pytorch's documentation on Dropout1d
Randomly zero out entire channels (a channel is a 1D feature map, e.g., the j-th channel of the i-th sample in the batched input is a 1D tensor input
[i, j]
). Each channel will be zeroed out independently on every forward call with probability p using samples from a Bernoulli distribution.
Then does that mean, with a tensor of shape (batch, channel, time)
, permute(0, 2, 1)
should be used along with F.dropout1d()
, so that the dropout will affect the channel
dimension?
x = x.permute(0, 2, 1) # convert to [batch, channels, time]
x = F.dropout1d(x, p)
x = x.permute(0, 2, 1) # back to [batch, time, channels]
And will this piece of code be equivalent to Tensorflow's SpatialDropout1D
?
That's correct, this piece of code will zero out values along the channel
dimension, and scale the remaining outputs by inputs by 1/(1-p)
so the sum over all inputs remains unchanged on average. That corresponds to the exact same behavior as tensorflow's SpatialDropout1D
.
A code snippet to compare what the outputs look like:
x = np.array([[[1, 0, 1], [2, 8.2, 2]]])
xtf = tf.convert_to_tensor(x)
xtf = tf.keras.layers.SpatialDropout1D(0.2)(xtf, training=True)
print(xtf)
xp = torch.from_numpy(x)
xp = xp.permute(0, 2, 1)
xp = torch.nn.functional.dropout1d(xp, 0.2)
xp = xp.permute(0, 2, 1)
print(xp)