[SOLVED] Why Keras Input behaves differently than a regular tensor

Why Keras Input behaves differently than a regular tensor

I have an embedding layer and a GRU layer in Keras as following:

embedding_layer = tf.keras.layers.Embedding(5000, 256, mask_zero=True)
gru_layer = tf.keras.layers.GRU(256, return_sequences=True, recurrent_initializer='glorot_uniform')

When I give the following inputs

A1 = np.random.random((64, 29))
A2 = embedding_layer(A1)
A3 = gru_layer(A2)
print(A1.shape, A2.shape, A3.shape)

everything is fine and I get

(64, 29) (64, 29, 256) (64, 29, 256)

But when I do

y2 = tf.keras.Input(shape=(64,29))
print(y2.shape)
y3 = embedding_layer(y2)
print(y3.shape)
y4 = gru_layer(y3)
print(y4.shape)

The first two print statements are fine and I get

(None, 64, 29)
(None, 64, 29, 256)

but then I get the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[125], line 5
      3 y3 = embedding_layer(y2)
      4 print(y3.shape)
----> 5 y4 = gru_layer(y3)
      6 print(y4.shape)

File /opt/conda/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py:123, in filter_traceback.<locals>.error_handler(*args, **kwargs)
    120     filtered_tb = _process_traceback_frames(e.__traceback__)
    121     # To get the full stack trace, call:
    122     # `keras.config.disable_traceback_filtering()`
--> 123     raise e.with_traceback(filtered_tb) from None
    124 finally:
    125     del filtered_tb

File /opt/conda/lib/python3.10/site-packages/keras/src/layers/input_spec.py:186, in assert_input_compatibility(input_spec, inputs, layer_name)
    184 if spec.ndim is not None and not spec.allow_last_axis_squeeze:
    185     if ndim != spec.ndim:
--> 186         raise ValueError(
    187             f'Input {input_index} of layer "{layer_name}" '
    188             "is incompatible with the layer: "
    189             f"expected ndim={spec.ndim}, found ndim={ndim}. "
    190             f"Full shape received: {shape}"
    191         )
    192 if spec.max_ndim is not None:
    193     if ndim is not None and ndim > spec.max_ndim:

ValueError: Input 0 of layer "gru_17" is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: (None, 64, 29, 256)

Why does Keras input behaves differently compared to a resular tensor and I get this error? Also why is the shape of these tensors printed like (None, 64, 29) as opposed to (64, 29)?

Solution

keras.Input expects the shape as the first argument and the batch size as the second argument:

keras.Input(
    shape=None,
    batch_size=None,
    ...
)

shape: A shape tuple (tuple of integers or None objects), not including the batch size.

So only initialize it with keras.Input(shape=(29,)).