juliaflux.jl

Error in the most simplest example in Flux.jl


I am testing the example here: https://fluxml.ai/Flux.jl/stable/models/overview/

using Flux
actual(x) = 4x + 2
x_train, x_test= hcat(0:5...), hcat(6:10...)
y_train, y_test = actual.(x_train), actual.(x_test)

predict = Dense(1 => 1)
predict(x_train)

loss(x,y) = Flux.Losses.mse(predict(x),y)
loss(x_train,y_train)

using Flux:train!
opt = Descent(0.1)
data = [(x_train, y_train)]

parameters = Flux.params(predict)
predict.weight in parameters, predict.bias in parameters

train!(loss, parameters, data, opt)

loss(x_train, y_train)

for epoch in 1:1000
    train!(loss, parameters, data, opt)
end

loss(x_train, y_train)

predict(x_test)
y_test

As you can see, it is just a very simple model actual(x) = 4x + 2. If you run these codes you will get an almost perfect prediction result.

1×5 Matrix{Float32}: 26.0001 30.0001 34.0001 38.0001 42.0001

1×5 Matrix{Int64}: 26 30 34 38 42

But if I make a minor change in term of feeding the model with one more data, like this:

x_train, x_test= hcat(0:6...), hcat(6:10...)

So I didn't change anything except line 3. I just changed 5 to 6. Then the prediction result will become infinite.

1×5 Matrix{Float32}: NaN NaN NaN NaN NaN

1×5 Matrix{Int64}: 26 30 34 38 42

But why?


Solution

  • I think this is simply a case of a high learning rate gone wrong. I can reproduce the same NaN behaviour with Descent(0.1). I tried printing it out and the loss goes to Inf first before NaN - a classic sign of a divergence because of a high learning rate. So I tried a learning rate of 0.01 and it works just fine - it gives the expected answer. It is probably diverging when x_train is hcat(0:6...). A smaller learning rate allows the network to take smaller steps and it manages to find the minimum as expected.