python machine-learning deep-learning computer-vision image-segmentation

Semantic Segmentation on computer vision - Output value

I'm new to Semantic Segmentation and I'm having a problem.

I have the 'imgs_3d' and 'segmentation' list that are the images we'll use for training.

X_train, X_test, y_train, y_test = train_test_split(imgs_3d, segmentation, train_size=0.8, random_state=0)

When I apply np.array(X_train).shape and np.array(y_train).shape we have the following result for both: (598, 1024, 2041, 3).

My intention is to train a simple cnn model just to know more about the methods, and then apply U-net.

What I have in mind is: "Train the model with the normal imgs and the img segmentation, then on X put normal images and y the img segmentation. Now it's just to train and have the output value as being the img segmentated"

But when I try to do it:

model = 

Sequential([Conv2D(filters=32, kernel_size=(3, 3), padding='same', input_shape=(1024, 2041, 3)),

MaxPooling2D(pool_size=(3, 3)),

])

model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy']
)

history = model.fit(X_train, y_train, epochs=5, verbose=1)

I get the error ValueError: Dimensions must be equal, but are 1024 and 341 for '{{node binary_crossentropy/mul}} = Mul[T=DT_FLOAT](binary_crossentropy/Cast, binary_crossentropy/Log)' with input shapes: [?,1024,2041,3], [?,341,680,32].

I don't know why this is happening. Please, help me (with details if possible)! Thanks!

enter image description here

Solution

There are several issues in your code:

1. Model output

Your provided model architecture is as follows:

model = Sequential([
    Conv2D(filters=32, kernel_size=(3, 3), padding='same', input_shape=(1024, 2041, 3)),
    MaxPooling2D(pool_size=(3, 3)),
])

The input shape is (1024, 2041, 3). After passing through the Conv2D, the shape becomes (1024, 2041, 32). After passing the MaxPooling2D, the shape is divided by 3, so model output is (341, 680, 32). This output shape doesn't match your segmentation task requirements, so it results the error messages.

In segmentation tasks, the model output should have the same shape as y_train. For example, if the shape of y_train is (1024, 2041, 20), your model's output shape should be equal to (1024, 2041, 20).

2. Loss Function

You seem to be training a multi-class segmentation model, so you should use categorical_crossentropy instead of binary_crossentropy.

Here I write a very simple example.

import tensorflow as tf
import keras
model = keras.Sequential([keras.layers.Conv2D(filters=5, kernel_size=(3, 3), padding='same', input_shape=(1024, 2041, 3)),])
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])
X_train = tf.random.normal([8, 1024, 2041, 3])
y_train = tf.random.normal([8, 1024, 2041, 5])
history = model.fit(X_train, y_train, epochs=5, verbose=1)

In this example, the shape of y_train is (1024, 2041, 5), so the model architecture is designed to produce an output with the same shape as y_train. Additionally, categorical_crossentropy is used for training the model.

For medical image segmentation tasks, we would consider more complex architectures like U-Net. If you encounter further issues with the model architecture, you can refer to GitHub repositories that focus on medical image segmentation.