I have tun this code in google colab with GPU to create a multilayer LSTM. It is for time series prediction.
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, LSTM, LSTM, BatchNormalization
from keras.optimizers import SGD
model = Sequential()
model.add(LSTM(units = 50, activation = 'relu', return_sequences=True, input_shape=
(1,len(FeaturesDataFrame.columns))))
model.add(Dropout(0.2))
model.add(LSTM(3, return_sequences=False))
model.add(Dense(1))
opt = SGD(lr=0.01, momentum=0.9, clipvalue=5.0)
model.compile(loss='mean_squared_error', optimizer=opt)
Note that I have used used the gradient-clipping. But still, when I train this model, it return nan as the training loss:
history = model.fit(X_t_reshaped, train_labels, epochs=20, batch_size=96, verbose=2)
This is the result
Epoch 1/20 316/316 - 2s - loss: nan Epoch 2/20 316/316 - 1s - loss: nan Epoch 3/20 316/316 - 1s - loss: nan Epoch 4/20 316/316 - 1s - loss: nan Epoch 5/20 316/316 - 1s - loss: nan Epoch 6/20 316/316 - 1s - loss: nan Epoch 7/20 316/316 - 1s - loss: nan Epoch 8/20 316/316 - 1s - loss: nan Epoch 9/20 316/316 - 1s - loss: nan Epoch 10/20 316/316 - 1s - loss: nan Epoch 11/20 316/316 - 1s - loss: nan Epoch 12/20 316/316 - 1s - loss: nan Epoch 13/20 316/316 - 1s - loss: nan Epoch 14/20 316/316 - 1s - loss: nan Epoch 15/20 316/316 - 1s - loss: nan Epoch 16/20 316/316 - 1s - loss: nan Epoch 17/20 316/316 - 1s - loss: nan Epoch 18/20 316/316 - 1s - loss: nan Epoch 19/20 316/316 - 1s - loss: nan Epoch 20/20 316/316 - 1s - loss: nan
I'm more familiar with working with PyTorch than Keras. However there are still a couple of things I would recommend doing:
Check your data. Ensure that there are no missing or null values in the data that you pass into your model. This is is the most likely culprit. A single null value will cause the loss to be NaN.
You could try lowering the learning rate (0.001 or something even smaller) and/or removing gradient clipping. I've actually had gradient contributing be the cause of NaN loss before.
Try scaling your data (though unscaled data will usually cause infinite losses rather than NaN loses). Use StandardScaler or one of the other scalers in sklearn.
If all that fails then I'd try to just pass some very simple dummy data into the model and see if the problem persists. Then you will know if it is a code problem or a data problem. Hope this helps and feel free to ask questions if you have them.