I'm new to Semantic Segmentation and I'm having a problem.
I have the 'imgs_3d' and 'segmentation' list that are the images we'll use for training.
X_train, X_test, y_train, y_test = train_test_split(imgs_3d, segmentation, train_size=0.8, random_state=0)
When I apply np.array(X_train).shape
and np.array(y_train).shape
we have the following result for both: (598, 1024, 2041, 3)
.
My intention is to train a simple cnn model just to know more about the methods, and then apply U-net.
What I have in mind is: "Train the model with the normal imgs and the img segmentation, then on X put normal images and y the img segmentation. Now it's just to train and have the output value as being the img segmentated"
But when I try to do it:
model =
Sequential([Conv2D(filters=32, kernel_size=(3, 3), padding='same', input_shape=(1024, 2041, 3)),
MaxPooling2D(pool_size=(3, 3)),
])
model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy']
)
history = model.fit(X_train, y_train, epochs=5, verbose=1)
I get the error ValueError: Dimensions must be equal, but are 1024 and 341 for '{{node binary_crossentropy/mul}} = Mul[T=DT_FLOAT](binary_crossentropy/Cast, binary_crossentropy/Log)' with input shapes: [?,1024,2041,3], [?,341,680,32].
I don't know why this is happening. Please, help me (with details if possible)! Thanks!
There are several issues in your code:
1. Model output
Your provided model architecture is as follows:
model = Sequential([
Conv2D(filters=32, kernel_size=(3, 3), padding='same', input_shape=(1024, 2041, 3)),
MaxPooling2D(pool_size=(3, 3)),
])
The input shape is (1024, 2041, 3)
. After passing through the Conv2D
, the shape becomes (1024, 2041, 32)
. After passing the MaxPooling2D
, the shape is divided by 3, so model output is (341, 680, 32)
. This output shape doesn't match your segmentation task requirements, so it results the error messages.
In segmentation tasks, the model output should have the same shape as y_train
. For example, if the shape of y_train
is (1024, 2041, 20)
, your model's output shape should be equal to (1024, 2041, 20)
.
2. Loss Function
You seem to be training a multi-class segmentation model, so you should use categorical_crossentropy
instead of binary_crossentropy
.
Here I write a very simple example.
import tensorflow as tf
import keras
model = keras.Sequential([keras.layers.Conv2D(filters=5, kernel_size=(3, 3), padding='same', input_shape=(1024, 2041, 3)),])
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])
X_train = tf.random.normal([8, 1024, 2041, 3])
y_train = tf.random.normal([8, 1024, 2041, 5])
history = model.fit(X_train, y_train, epochs=5, verbose=1)
In this example, the shape of y_train
is (1024, 2041, 5)
, so the model architecture is designed to produce an output with the same shape as y_train
. Additionally, categorical_crossentropy
is used for training the model.
For medical image segmentation tasks, we would consider more complex architectures like U-Net. If you encounter further issues with the model architecture, you can refer to GitHub repositories that focus on medical image segmentation.