python tensorflow keras lstm recurrent-neural-network

Understanding the structure of my LSTM model

I'm trying to solve the following problem:

I have time series data from a number of devices.
Each device recording is of length 3000.
Every datapoint captured has 4 measurements.

Therefore, my data is shaped: (number of device recordings, 3000, 4).

I'm trying produce a vector of length 3000 where each data point of is one of 3 labels (y1, y2, y3), so my desired output dim is (number of device recording, 3000, 1). I have labeled data for training.

I'm trying to use an LSTM model for this, as 'classification as I move along time series data' seems like a RNN type of problem.

I have my network set up like this:

model = Sequential()
model.add(LSTM(3, input_shape=(3000, 4), return_sequences=True))
model.add(LSTM(3, activation = 'softmax', return_sequences=True))

model.summary()

and the summary looks like this:

Model: "sequential_23"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_29 (LSTM)               (None, 3000, 3)           96        
_________________________________________________________________
lstm_30 (LSTM)               (None, 3000, 3)           84        
=================================================================
Total params: 180
Trainable params: 180
Non-trainable params: 0
_________________________________________________________________

All looks good and well in the output space, as I can use the result from each unit to determine which of my three categories belongs to that particular time step (I think).

But I only have 180 trainable parameters, so I'm guessing that I am doing something horribly wrong.

Questions:

Can someone help me understand why I have so few trainable parameters?
Am I misinterpreting how to set up this LSTM?
Am I just worrying over nothing?
Does that 3 units mean I only have 3 LSTM 'blocks'?
And that it can only look back 3 observations?

Solution

In a simplistic viewpoint, you can consider a LSTM layer as an augmented Dense layer with a memory (hence enabling efficient processing of sequences). So the concept of "units" is also the same for both: the number of neurons or feature units of these layers, or in other words, the number of distinctive features these layers can extract from the input.

Therefore, when you specify the number of units to 3 for the LSTM layer, more or less it means that this layer can only extract 3 distinctive features from the input timesteps (note that the number of units has nothing to do with the length of input sequence, i.e. the entire input sequence will be processed by the LSTM layer no matter what the number of units or the length of input sequence is).

Usually, this might be sub-optimal (though, it really depends on the difficulty of the specific problem and dataset you are working on; i.e. maybe 3 units might be enough for your problem/dataset, and you should experiment to find out). Therefore, often a higher number is chosen for the number of units (common choices: 32, 64, 128, 256), and also the classification task is delegated to a dedicated Dense layer (or sometimes called "softmax layer") at the top of the model.

For example, considering the description of your problem, a model with 3 stacked LSTM layers and a Dense classification layer at the top might look like this:

model = Sequential()
model.add(LSTM(64, return_sequences=True, input_shape=(3000, 4)))
model.add(LSTM(64, return_sequences=True))
model.add(LSTM(32, return_sequences=True))
model.add(Dense(3, activation = 'softmax'))