machine-learningkerasdeep-learningneural-networkkeras-layer

Avoiding vanishing gradient in deep neural networks


I'm taking a look at Keras to try to dive into deep learning.

From what I know, stacking just a few dense layers effectively stops back propagation from working due to vanishing gradient problem.

I found out that there is a pre-trained VGG-16 neural network you can download and build on top of it.

This network has 16 layers so I guess, this is the territory where you hit the vanishing gradient problem.

Suppose I wanted to train the network myself in Keras. How should I do it? Should I divide the layers into clusters and train them independently as autoecoders and than stack a classifier on top of it and train it? Is there a built-in mechanism for it in Keras?


Solution

  • No, the vanishing gradient problem is not as prevalent as before, as pretty much all networks (except recurrent ones) use ReLU activations which are considerably less prone to have this problem.

    You should just train a network from scratch and see how it works. Do not try to deal with a problem that you don't have yet.