python arrays tensorflow sparse-matrix tf.keras

Confused about how tf.keras.Sequential works in TensorFlow – especially activation and input_shape

I'm learning TensorFlow to build machine learning models in Python. I tried following the official documentation on creating a simple classification model, but I couldn't clearly understand the tf.keras.Sequential function.

Could you explain how it works in detail, or provide a simple, practical example for beginners?

I tried following the TensorFlow documentation and built a basic neural network model using this structure:

import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(784,)),
    tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

I was expecting this simple example to clearly demonstrate how a Sequential model works, allowing me to move forward in learning TensorFlow. However, I wasn't completely sure about the role of each part of the model, especially the parameters like activation and input_shape.

I didn't receive errors, but my confusion remained regarding the logic behind using Sequential. My expectation was a deeper understanding of why and how each layer interacts within this method.

Solution

The tf.keras.Sequential class in TensorFlow is one way to build neural network models by stacking layers in a linear, step-by-step fashion. It is most suitable when each layer has exactly one input tensor and one output tensor, which is typical for straightforward feedforward neural networks.

In your example, You are constructing a model with two layers. The first layer is a dense (fully connected) layer with 64 neurons and uses the ReLU activation function. The input_shape=(784,) parameter indicates that each input to the model will be a one-dimensional array of 784 values.

The second layer is another dense layer with 10 neurons, one for each class in a 10-class classification problem (digits 0–9 for example). It uses the softmax activation function, which converts the raw output scores from the neurons into probabilities that sum to 1, allowing the model to make predictions by selecting the class with the highest probability.