[SOLVED] What is the relationship between batch size, timestep and error in LSTM (Keras)?

What is the relationship between batch size, timestep and error in LSTM (Keras)?

Let,

Sample Size = 100 (X1,X2,...,X100)

Timesteps = 5

Input Feature = 10

Error Calculation:

How is the error calculation done when batch size = Sample size? My understanding: I will insert X1,X2,X3,X4,X5 into LSTM and get an output after 5 timesteps, say Y1.

Error E1 = X6 - Y1. Similarly I will calculate E2,E3,...,E95.

Actual Error = E1+E2+....+E95. This will be used to update weights.

Is it correct?

Error for Batch:

Based on above understanding. If batch size = 10. Then only E1,E2,E3,E4 and E5 will be used to calculate actual error. This will be used to update weights.

Batching in stateful LSTM:

Batches allows the model to allow parallelism where each entity in the batch calculates its error and then all the errors are summed. How does LSTM achieve parallelism within a batch if the LSTM is stateful (the hidden states of previous sequence are used to initialize the hidden states of next sequence, is this understanding of Satetful correct?) ?

References:

LSTM Batches vs Timesteps

Understanding Keras LSTMs: Role of Batch-size and Statefulness

doubts regarding batch size and time steps in RNN

Solution

Batch size effect on LSTM: For batch size 1 the model takes 1 input at each timestep. For batch size n, model takes n input at each Timestep

Image for clarification Credit: Deeplearning.ai

Error Calculation part mentioned in question: It is the error calculation for batch size 1.

Error for a batch: Sum up the error for each element of the batch to get the final error

Batching in Stateful LSTM: My understanding of parallelism was incorrect. Parallelism is done within batch not across them.