I would like to classify objects in an image such as cars. There could be multiple cars in the image and I would like to get brand, color, type of each car in the image.
I created a multibranch CNN to identify each property and used sigmoid activation function to get multiple labels.
For example below is the code for one category:
# Shared convolutional layers
shared_model = tf.keras.Sequential(
[
data_augmentation,
tf.keras.layers.Conv2D(
32, (3, 3), activation="relu", input_shape=input_shape
),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
tf.keras.layers.Conv2D(32, (3, 3), activation="relu"),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
tf.keras.layers.Conv2D(64, (3, 3), activation="relu"),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
]
)(input)
category_dense = tf.keras.Sequential(
[
tf.keras.layers.Conv2D(64, (3, 3), activation="relu"),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
tf.keras.layers.Conv2D(128, (3, 3), activation="relu"),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation="relu", kernel_regularizer=l2(0.001)),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(
num_category_classes, activation="sigmoid", name="category_output"
),
],
name="category_sequential",
)(shared_model)
# Define separate dense layers for each subcategory branch
subcategory_dense = tf.keras.Sequential(
[
tf.keras.layers.Conv2D(64, (3, 3), activation="relu"),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
tf.keras.layers.Conv2D(128, (3, 3), activation="relu"),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation="relu", kernel_regularizer=l2(0.001)),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(
num_subcategory_classes, activation="sigmoid", name="subcategory_output"
),
],
name="subcategory_sequential",
)(shared_model)
The above code works perfectly fine when there is one car in the image. However, for cases where there are multiple objects (cars), things get complicated. The output does find multiple cars in the image and also outputs multiple properties. However, I lose the connection between these properties. For example, the output could be [ford, toyota] for brands and [red, blue] for colors but I do not know which is which.
In summary, I would like to ask if this is the right approach? Or do I have to use something like YOLO instead?
Ideally, I prefer to stay away from YOLO like approach as I need to select coordinates in each sample picture, which is very time consuming as opposed placing different car pictures under labelled folders. I only use images with one car for training but try to accomplish correct identification for cases with multiple cars for testing.
Many thanks for your guidance/pointers,
Doug
The issue is that your code is an image classification model, not object detection. In other words, it can classify the whole image, not the specific objects inside of it. That is why if you have more than one car, it is rendered useless. Es
I would like to classify objects in an image such as cars.
Well, your code classifies the whole image, not the objects.