kerasdeep-learninglstmlstm-stateful

Training many-to-many stateful LSTM with and without final dense layer


I am trying to train a recurrent model in Keras containing an LSTM for regression purposes. I would like to use the model online and, as far as I understood, I need to train a stateful LSTM. Since the model has to output a sequence of values, I hope it computes the loss on each of the expected output vector. However, I fear my code is not working this way and I would be grateful if anyone would help me to understand if I am doing right or if there is some better approach.

The input to the model is a sequence of 128-dimensional vectors. Each sequence in the training set has a different lenght. At each time, the model should output a vector of 3 elements.

I am trying to train and compare two models: A) a simple LSTM with 128 inputs and 3 outputs; B) a simple LSTM with 128 inputs and 100 outputs + a dense layer with 3 outputs;

For model A) I wrote the following code:

# Model
model = Sequential()
model.add(LSTM(3, batch_input_shape=(1, None, 128),  return_sequences=True, activation = "linear", stateful = True))`
model.compile(loss='mean_squared_error', optimizer=Adam())

# Training
for i in range(n_epoch):
    for j in np.random.permutation(n_sequences):
        X = data[j] # j-th sequences
        X = X[np.newaxis, ...] # X has size 1 x NTimes x 128

        Y = dataY[j] # Y has size NTimes x 3

        history = model.fit(X, Y, epochs=1, batch_size=1, verbose=0, shuffle=False)
        model.reset_states()

With this code, model A) seems to train fine because the output sequence approaches the ground-truth sequence on the training set. However, I wonder if the loss is really computed by considering all NTimes output vectors.

For model B), I could not find any way to get the entire output sequence due to the dense layer. Hence, I wrote:

# Model
model = Sequential()
model.add(LSTM(100, batch_input_shape=(1, None, 128), , stateful = True))
model.add(Dense(3,   activation="linear"))
model.compile(loss='mean_squared_error', optimizer=Adam())

# Training
for i in range(n_epoch):
    for j in np.random.permutation(n_sequences):
        X = data[j]  #j-th sequence
        X = X[np.newaxis, ...] # X has size 1 x NTimes x 128

        Y = dataY[j] # Y has size NTimes x 3

        for h in range(X.shape[1]):
            x = X[0,h,:]
            x = x[np.newaxis, np.newaxis, ...] # h-th vector in j-th sequence
            y = Y[h,:]
            y = y[np.newaxis, ...]
            loss += model.train_on_batch(x,y)
        model.reset_states() #After the end of the sequence

With this code, model B) does not train fine. It seems to me the training does not converge and loss values increase and decrease cyclically I have also tried to use as Y only the last vector and them calling the fit function on the Whole training sequence X, but no improvements.

Any idea? Thank you!


Solution

  • If you want to still have three outputs per step of your sequence, you need to TimeDistribute your Dense layer like so:

    model.add(TimeDistributed(Dense(3, activation="linear")))
    

    This applies the dense layer to each timestep independently.

    See https://keras.io/layers/wrappers/#timedistributed