keraslstmrecurrent-neural-networktf.keraslstm-stateful

Can I split my long sequences into 3 smaller ones and use a stateful LSTM for 3 samples?


I am doing a time-series sequence classification problem.

I have 80 time-series all length 1002. Each seq corresponds to 1 of 4 categories (copper, cadmium, lead, mercury). I want to use Keras LSTMs to model this. These models require data to be fed in the form [batches, timesteps, features]. As each seq is independent, the most basic setup is for X_train to have shape [80, 1002, 1]. This works fine in an LSTM (with stateful=False)

But, 1002 is quite a long seq length. A smaller size could perform better.

Let's say I split each seq up into 3 parts of 334. I could continue to use a stateless LSTM. But (I think?) it makes sense to have it be stateful for 3 samples and then reset state (since the 3 chunks are related).

How do I implement this in Keras?

First, I transform the data into shape [240, 334, 1] using a simple X_train.reshape(-1, 334, 1) but how do I maintain the state for 3 samples and then reset the state in model.fit()?

I know I need to call model.reset_states() somewhere but couldn't find any sample code out there showing me how to work it. Do I have to subclass a model? Can I do this using for epoch in range(num_epochs) and GradientTape? What are my options? How can I implement this?

Also, if I split the sequences up, what do I do with the labels? Do I multiply them by the number of chunks each seq is split up into (3 in this case)? Is there a way for an LSTM to ingest 3 samples and then spit out one prediction? Or does each sample have to correspond to a prediction?

Finally, if I split my sequences up into 3 subsequences, do I have to have a batch size of 3? Or can I choose any multiple of 3?

Here is the super basic code I used with X_train.shape == [80, 1002, 1].

model = Sequential([
    LSTM(10, batch_input_shape=(10, 1002, 1)), # 10 samples per batch
    Dense(4, activation='sigmoid')
])
model.compile(loss='categorical_crossentropy',
             optimizer='rmsprop',
             metrics=['accuracy'])
model.fit(X_train, y_train, epochs=3, batch_size=10, shuffle=False)

I know there are loads of questions here, happy to make separate ones if this is too much for one.


Solution

  • The easy solution is to reshape the data from having 1 feature to having 3.

    Turn [80, 1002, 1] into [80, 334, 3] rather than [240, 334, 1]. This keeps the number of samples the same and so you don't have to mess around with statefulness. You can also just use it with the normal fit() API.