Consider the following model:
model = Sequential()
model.add(Dense(60, input_shape=(60,), activation='relu', kernel_constraint=MaxNorm(3)))
model.add(Dropout(0.2))
model.add(Dense(30, activation='relu', kernel_constraint=MaxNorm(3)))
model.add(Dropout(0.2))
model.add(Dense(1, activation='sigmoid'))
I understand the idea behind Dropout for regularization. According to my understanding, Dropout is applied per layer with a rate p
which determines the probability of a neuron being dropped. In the above example, I cannot understand whether the first dropout layer is applied to the first hidden layer or the second hidden layer. Because as I have mentioned before, the dropout is applied per layer, and what confuses me here is that Keras deals with dropout as a layer on its own. Moreover, if the first dropout layer is applied to the second hidden layer, what about the second dropout layer? Is it applied to the output layer (which is not valid at all to apply dropout to output neurons)? So please can someone clarify these points?
As per documentation in keras:
Applies Dropout to the input.
Therefore the input layer to the drop out is dropped at a probability of p
. In your case it means the first layer. In your example, 20% of 60 Neurons from first layer will be dropped.
Also it doesn't make sense if drop out works on the layer succeeding it because, in that way you will drop out from the last layer - which in classification can be the result.