[SOLVED] Keras Conv1D input for a very large number of samples

Keras Conv1D input for a very large number of samples

I have a dataset containing a huge amount of samples 1686663 and 107 features (1686663, 107). I'm building a neural network using keras, and wanted to apply a 1D convolution Conv1D.

The input for the Conv1D is (batch size, number_features, timestep). the batch size is basically the number of samples, however in my case i cannot use the number of samples which is too large for my RAM. So i selected a batch size = 512.

in_shape = (batch_size,x_train.shape[1],1)

Hence, my input shape is now (512, 107, 1).

I reshaped the training vectors to match the convolution :

x_train = x_train.reshape(x_train.shape[0],x_train_shape[1],1)

When running training i get the following error:

ValueError: Input 0 of layer "sequential_10" is incompatible with the layer: expected shape=(None, 512, 107, 1), found shape=(None, 107, 1)

Could anyone tell me what I am missing here ?

Solution

When you specify the input shape, either by adding a tf.keras.Input layer as first layer, or by setting the argument input_shape directly in the first layer of your model, you don't have to add the batch size. So in your case it would be:

in_shape = (x_train.shape[1], 1)

The batch size is automatically set as first dimension of your input shape, by taking the value you set in the batch_size argument of the fit() method.

But if you do like this (batch_size, x_train.shape[1], 1), it will add the batch size twice.

The error is basically saying that it expected to find (batch size, 512, 107, 1) but found (batch size, 107, 1). It was expecting that additional 512, because you added the batch size twice.