I create a neural network like this:
n = FeedForwardNetwork()
inLayer = LinearLayer(43)
bias = BiasUnit()
hiddenLayer = SigmoidLayer(100)
outLayer = LinearLayer(1)
n.addInputModule(inLayer)
n.addModule(bias)
n.addModule(hiddenLayer)
n.addOutputModule(outLayer)
in_to_hidden = FullConnection(inLayer, hiddenLayer)
bias_to_hidden = FullConnection(bias, hiddenLayer)
hidden_to_out = FullConnection(hiddenLayer, outLayer)
n.addConnection(in_to_hidden)
n.addConnection(bias_to_hidden)
n.addConnection(hidden_to_out)
n.sortModules()
I train it the following way (I'm simplifying, it's being trained in multiple iterations):
self.trainer = BackpropTrainer(self.neural_net, learningrate=0.8)
(...)
ds = SupervisedDataSet(self.net_input_size, 1)
ds.addSample([...], np.float64(learned_value))
(...)
self.trainer.trainOnDataset(ds)
Sometimes I get the following warnings:
(...)/lib/python3.5/site-packages/PyBrain-0.3.1-py3.5.egg/pybrain/supervised/trainers/backprop.py:99: RuntimeWarning: overflow encountered in square error += 0.5 * sum(outerr ** 2)
(...)/lib/python3.5/site-packages/PyBrain-0.3.1-py3.5.egg/pybrain/structure/modules/sigmoidlayer.py:14: RuntimeWarning: invalid value encountered in multiply inerr[:] = outbuf * (1 - outbuf) * outerr
And then when I check the saved net file I see that all weights are nan
:
(...)
<FullConnection class="pybrain.structure.connections.full.FullConnection" name="FullConnection-8">
<inmod val="BiasUnit-5"/>
<outmod val="SigmoidLayer-11"/>
<Parameters>[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]</Parameters>
</FullConnection>
(...)
As suggested, here comes an answer:
Having a learning rate of 0.8 is ineffective since it can lead to errors like yours and prevent effective learning of the network.
With such a high learning rate, based on your cost function, the network could easily change your weights by very large amounts, therefore the weights might overflow into NaN values.
Generally (even if your weights do not overflow into NaN values) a high learning rate is not a good idea in terms of learning, too. You're network solves specific problems by learning from a large training data set. If you're learning rate is very high, like 0.8, the network adapts very hard to the current epoch's data. Most of the information / learned features of former epoch's will be completely lost, because the network adjusts itself strongly to the current epoch's error rate.
For most problems typical learning rates are something like 0.01 or 0.001 or even less because you want to draw small conclusion from one single epoch and rather learn invariant features of several epochs.