pythontensorflowkeraslstmlstm-stateful

Keras stateful LSTM error: Specified a list with shape [4,1] from a tensor with shape [32,1]


With this code I get an error while I try to run the prediction with X_test. The error occurs after the fitting, while y_pred = model.predict(X_test) is executed.

X_train.input_shape()
-> (784, 300, 7)
y_train.input_shape()
-> (784, 300, 1)

X_test.input_shape()
-> (124,300,7)
y_test.input_shape()
-> (124,300,1)


batchsize = 4
model = Sequential()
model.add(Masking(mask_value=0, batch_input_shape=(batchsize, len(X_train[0]),len(X_train[0][0]))))
model.add(Bidirectional(LSTM(200, return_sequences=True, stateful=True)))
model.add(TimeDistributed(Dense(len(classdict))))
model.compile(loss="sparse_categorical_crossentropy",
              optimizer=Adam(0.001),
              metrics=['accuracy'])
model.summary()
model.fit(X_train, y_train, epochs=1, batch_size=batchsize)
y_pred = model.predict(X_test) 

Error:

Traceback (most recent call last):
  File "lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 55, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:

Specified a list with shape [4,1] from a tensor with shape [32,1]
     [[{{node TensorArrayUnstack_1/TensorListFromTensor}}]]
     [[sequential/bidirectional/backward_lstm/PartitionedCall]] [Op:__inference_predict_function_13360]

In my opinion, the error could be related to the selected batchsize. In my research I read that the number of samples has to be divisible by the batchsize for stateful LSTMs. So I searched for the greatest common divisor of the number of samples of X_train and X_test, so GCD(124,784) = 4 = batchsize. However, now this error occurs, I have already tried different batchsizes, but then other errors occur. Has someone a idea/fix for this?


Solution

  • You just need to make sure that the number of training samples can be divided evenly by the batch size. Here is a working example:

    import tensorflow as tf
    
    X_train = tf.random.normal((784, 300, 7))
    y_train = tf.random.normal((784, 300, 1))
    X_test = tf.random.normal((124,300,7))
    
    batchsize = 4
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Masking(mask_value=0, batch_input_shape=(batchsize, len(X_train[0]),len(X_train[0][0]))))
    model.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(200, return_sequences=True, stateful=True)))
    model.add(tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(1)))
    model.compile(loss="mse",
                  optimizer=tf.keras.optimizers.Adam(0.001))
    model.summary()
    model.fit(X_train, y_train, batch_size=batchsize, epochs=1)
    
    Model: "sequential_5"
    _________________________________________________________________
     Layer (type)                Output Shape              Param #   
    =================================================================
     masking_5 (Masking)         (4, 300, 7)               0         
                                                                     
     bidirectional_5 (Bidirectio  (4, 300, 400)            332800    
     nal)                                                            
                                                                     
     time_distributed_5 (TimeDis  (4, 300, 1)              401       
     tributed)                                                       
                                                                     
    =================================================================
    Total params: 333,201
    Trainable params: 333,201
    Non-trainable params: 0
    _________________________________________________________________
    196/196 [==============================] - 47s 131ms/step - loss: 1.0052
    <keras.callbacks.History at 0x7fee901f8950>
    

    And also explicitly set the batch size when predicting, since according to the docs if left unspecified, batch_size will default to 32:

    X_test = tf.random.normal((124,300,7))
    y_pred = model.predict(X_test, batch_size=batchsize) 
    print(y_pred.shape)
    
    (124, 300, 1)
    

    Also make sure your batch is the same everywhere.