python tensorflow keras deep-learning functional-api

Multi-input Multi-output Model with Keras Functional API

As described in figure 1, I have 3 models which each apply to a particular domain.

The 3 models are trained separately with different datasets.

And inference is sequential :

I tried to parallelize the call of these 3 models thanks to the Multiprocess library of python but it is very unstable and it is not advised.

Here's the idea I got to make sure to do this all at once:

As the 3 models share a common pretrained-model, I want to make a single model that has multiple inputs and multiple outputs.

As the following drawing shows:

Like that during the inference, I will call a single model which will do all 3 operations at the same time.

I saw that with The Functional API of KERAS, it is possible but I have no idea how to do that. The inputs of the datasets have the same dimension. These are pictures of (200,200,3).

If anyone has an example of a Multi-Input Multi-output model that shares a common structure, I'm all ok.

UPADE

Here is the example of my code but it returns an error because of the layers. concatenate (...) line which propagates a shape that is not taken into account by the EfficientNet model.

age_inputs = layers.Input(shape=(IMG_SIZE, IMG_SIZE, 3), name="age_inputs")
    
gender_inputs = layers.Input(shape=(IMG_SIZE, IMG_SIZE, 3)
                               , name="gender_inputs")
    
emotion_inputs = layers.Input(shape=(IMG_SIZE, IMG_SIZE, 3), 
                                name="emotion_inputs")


inputs = layers.concatenate([age_inputs, gender_inputs, emotion_inputs])
inputs = layers.Conv2D(3, (3, 3), activation="relu")(inputs)    
model = EfficientNetB0(include_top=False, 
                   input_tensor=inputs, weights="imagenet")
    

model.trainable = False

inputs = layers.GlobalAveragePooling2D(name="avg_pool")(model.output)
inputs = layers.BatchNormalization()(inputs)

top_dropout_rate = 0.2
inputs = layers.Dropout(top_dropout_rate, name="top_dropout")(inputs)

age_outputs = layers.Dense(1, activation="linear", 
                          name="age_pred")(inputs)
gender_outputs = layers.Dense(GENDER_NUM_CLASSES, 
                              activation="softmax", 
                              name="gender_pred")(inputs)
emotion_outputs = layers.Dense(EMOTION_NUM_CLASSES, activation="softmax", 
                             name="emotion_pred")(inputs)

model = keras.Model(inputs=[age_inputs, gender_inputs, emotion_inputs], 
              outputs =[age_outputs, gender_outputs, emotion_outputs], 
              name="EfficientNet")

optimizer = keras.optimizers.Adam(learning_rate=1e-2)
model.compile(loss={"age_pred" : "mse", 
                   "gender_pred":"categorical_crossentropy", 
                    "emotion_pred":"categorical_crossentropy"}, 
                   optimizer=optimizer, metrics=["accuracy"])

(age_train_images, age_train_labels), (age_test_images, age_test_labels) = reg_data_loader.load_data(...)
(gender_train_images, gender_train_labels), (gender_test_images, gender_test_labels) = cat_data_loader.load_data(...)
(emotion_train_images, emotion_train_labels), (emotion_test_images, emotion_test_labels) = cat_data_loader.load_data(...)

 model.fit({'age_inputs':age_train_images, 'gender_inputs':gender_train_images, 'emotion_inputs':emotion_train_images},
         {'age_pred':age_train_labels, 'gender_pred':gender_train_labels, 'emotion_pred':emotion_train_labels},
                 validation_split=0.2, 
                       epochs=5, 
                            batch_size=16)

Solution

We can do that easily in tf. keras using its awesome Functional API. Here we will walk you through how to build multi-out with a different type (classification and regression) using Functional API.

According to your last diagram, you need one input model and three outputs of different types. To demonstrate, we will use MNIST which is a handwritten dataset. It's normally a 10 class classification problem data set. From it, we will create an additionally 2 class classifier (whether a digit is even or odd) and also a 1 regression part (which is to predict the square of a digit, i.e for image input of 9, it should give approximately it's square).

Data Set

import numpy as np 
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

(xtrain, ytrain), (_, _) = keras.datasets.mnist.load_data()

# 10 class classifier 
y_out_a = keras.utils.to_categorical(ytrain, num_classes=10) 

# 2 class classifier, even or odd 
y_out_b = keras.utils.to_categorical((ytrain % 2 == 0).astype(int), num_classes=2) 

# regression, predict square of an input digit image
y_out_c = tf.square(tf.cast(ytrain, tf.float32))

So, our training pairs will be xtrain and [y_out_a, y_out_b, y_out_c], the same as your last diagram.

Model Building

Let's build the model accordingly using the Functional API of tf. keras. See the model definition below. The MNIST samples are a 28 x 28 grayscale image. So our input is set in that way. I'm guessing your data set is probably RGB, so change the input dimension accordingly.

input = keras.Input(shape=(28, 28, 1), name="original_img")
x = layers.Conv2D(16, 3, activation="relu")(input)
x = layers.Conv2D(32, 3, activation="relu")(x)
x = layers.MaxPooling2D(3)(x)
x = layers.Conv2D(32, 3, activation="relu")(x)
x = layers.Conv2D(16, 3, activation="relu")(x)
x = layers.GlobalMaxPooling2D()(x)

out_a = keras.layers.Dense(10, activation='softmax', name='10cls')(x)
out_b = keras.layers.Dense(2, activation='softmax', name='2cls')(x)
out_c = keras.layers.Dense(1, activation='linear', name='1rg')(x)

encoder = keras.Model( inputs = input, outputs = [out_a, out_b, out_c], name="encoder")

# Let's plot 
keras.utils.plot_model(
    encoder
)

One thing to note, while defining out_a, out_b, and out_c during model definition we set their name variable which is very important. Their names are set '10cls', '2cls', and '1rg' respectively. You can also see this from the above diagram (last 3 tails).

Compile and Run

Now, we can see why that name variable is important. In order to run the model, we need to compile it first with the proper loss function, metrics, and optimizer. Now, if you know that, for the classification and regression problem, the optimizer can be the same but for the loss function and metrics should be changed. And in our model, which has a multi-type output model (2 classifications and 1 regression), we need to set proper loss and metrics for each of these types. Please, see below how it's done.

encoder.compile(
    loss = {
        "10cls": tf.keras.losses.CategoricalCrossentropy(),
        "2cls": tf.keras.losses.CategoricalCrossentropy(),
        "1rg": tf.keras.losses.MeanSquaredError()
    },

    metrics = {
        "10cls": 'accuracy',
        "2cls": 'accuracy',
        "1rg": 'mse'
    },

    optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
)

See, each last output of our above model, which is here represented by their name variables. And we set proper compilation to them. Hope you understand this part. Now, time to train the model.

encoder.fit(xtrain, [y_out_a, y_out_b, y_out_c], epochs=30, verbose=2)

Epoch 1/30
1875/1875 - 6s - loss: 117.7318 - 10cls_loss: 3.2642 - 4cls_loss: 0.9040 - 1rg_loss: 113.5637 - 10cls_accuracy: 0.6057 - 4cls_accuracy: 0.8671 - 1rg_mse: 113.5637
Epoch 2/30
1875/1875 - 5s - loss: 62.1696 - 10cls_loss: 0.5151 - 4cls_loss: 0.2437 - 1rg_loss: 61.4109 - 10cls_accuracy: 0.8845 - 4cls_accuracy: 0.9480 - 1rg_mse: 61.4109
Epoch 3/30
1875/1875 - 5s - loss: 50.3159 - 10cls_loss: 0.2804 - 4cls_loss: 0.1371 - 1rg_loss: 49.8985 - 10cls_accuracy: 0.9295 - 4cls_accuracy: 0.9641 - 1rg_mse: 49.8985


Epoch 28/30
1875/1875 - 5s - loss: 15.5841 - 10cls_loss: 0.1066 - 4cls_loss: 0.0891 - 1rg_loss: 15.3884 - 10cls_accuracy: 0.9726 - 4cls_accuracy: 0.9715 - 1rg_mse: 15.3884
Epoch 29/30
1875/1875 - 5s - loss: 15.2199 - 10cls_loss: 0.1058 - 4cls_loss: 0.0859 - 1rg_loss: 15.0281 - 10cls_accuracy: 0.9736 - 4cls_accuracy: 0.9727 - 1rg_mse: 15.0281
Epoch 30/30
1875/1875 - 5s - loss: 15.2178 - 10cls_loss: 0.1136 - 4cls_loss: 0.0854 - 1rg_loss: 15.0188 - 10cls_accuracy: 0.9722 - 4cls_accuracy: 0.9736 - 1rg_mse: 15.0188
<tensorflow.python.keras.callbacks.History at 0x7ff42c18e110>

That's how each of the outputs of the last layer optimizes by their concern loss function. FYI, one thing to mention, there is an essential parameter while .compile the model which you might need: loss_weights - to weight the loss contributions of different model outputs. See my other answer here on this.

Prediction / Inference

Let's see some output. We now hope this model will predict 3 things: (1) is what the digit is, (2) is it even or odd, and (3) its square value.

import matplotlib.pyplot as plt
plt.imshow(xtrain[0])

If we like to quickly check the output layers of our model

encoder.output

[<KerasTensor: shape=(None, 10) dtype=float32 (created by layer '10cls')>,
 <KerasTensor: shape=(None, 2) dtype=float32 (created by layer '4cls')>,
 <KerasTensor: shape=(None, 1) dtype=float32 (created by layer '1rg')>]

Passing this xtrain[0] (which we know 5) to the model to do predictions.

# we expand for a batch dimension: (1, 28, 28, 1)
pred10, pred2, pred1 = encoder.predict(tf.expand_dims(xtrain[0], 0))

# regression: square of the input dgit image 
pred1 
array([[22.098022]], dtype=float32)

# even or odd, surely odd 
pred2.argmax()
0

# which number, surely 5
pred10.argmax()
5

Update

Based on your comment, we can extend the above model to take multi-input too. We need to change things. To demonstrate, we will use train and test samples of the mnist data set to the model as a multi-input.

(xtrain, ytrain), (xtest, _) = keras.datasets.mnist.load_data()

xtrain = xtrain[:10000] # both input sample should be same number 
ytrain = ytrain[:10000] # both input sample should be same number

y_out_a = keras.utils.to_categorical(ytrain, num_classes=10)
y_out_b = keras.utils.to_categorical((ytrain % 2 == 0).astype(int), num_classes=2)
y_out_c = tf.square(tf.cast(ytrain, tf.float32))

print(xtrain.shape, xtest.shape) 
print(y_out_a.shape, y_out_b.shape, y_out_c.shape)
# (10000, 28, 28) (10000, 28, 28)
# (10000, 10) (10000, 2) (10000,)

Next, we need to modify some parts of the above model to take multi-input. And next if you now plot, you will see the new graph.

input0 = keras.Input(shape=(28, 28, 1), name="img2")
input1 = keras.Input(shape=(28, 28, 1), name="img1")
concate_input = layers.Concatenate()([input0, input1])

x = layers.Conv2D(16, 3, activation="relu")(concate_input)
...
...
...
# multi-input , multi-output
encoder = keras.Model( inputs = [input0, input1], 
                       outputs = [out_a, out_b, out_c], name="encoder")

Now, we can train the model as follows

# multi-input, multi-output
encoder.fit([xtrain, xtest], [y_out_a, y_out_b, y_out_c], 
             epochs=30, batch_size = 256, verbose=2)

Epoch 1/30
40/40 - 1s - loss: 66.9731 - 10cls_loss: 0.9619 - 2cls_loss: 0.4412 - 1rg_loss: 65.5699 - 10cls_accuracy: 0.7627 - 2cls_accuracy: 0.8815 - 1rg_mse: 65.5699
Epoch 2/30
40/40 - 0s - loss: 60.5408 - 10cls_loss: 0.8959 - 2cls_loss: 0.3850 - 1rg_loss: 59.2598 - 10cls_accuracy: 0.7794 - 2cls_accuracy: 0.8928 - 1rg_mse: 59.2598
Epoch 3/30
40/40 - 0s - loss: 57.3067 - 10cls_loss: 0.8586 - 2cls_loss: 0.3669 - 1rg_loss: 56.0813 - 10cls_accuracy: 0.7856 - 2cls_accuracy: 0.8951 - 1rg_mse: 56.0813
...
...
Epoch 28/30
40/40 - 0s - loss: 29.1198 - 10cls_loss: 0.4775 - 2cls_loss: 0.2573 - 1rg_loss: 28.3849 - 10cls_accuracy: 0.8616 - 2cls_accuracy: 0.9131 - 1rg_mse: 28.3849
Epoch 29/30
40/40 - 0s - loss: 27.5318 - 10cls_loss: 0.4696 - 2cls_loss: 0.2518 - 1rg_loss: 26.8104 - 10cls_accuracy: 0.8645 - 2cls_accuracy: 0.9142 - 1rg_mse: 26.8104
Epoch 30/30
40/40 - 0s - loss: 27.1581 - 10cls_loss: 0.4620 - 2cls_loss: 0.2446 - 1rg_loss: 26.4515 - 10cls_accuracy: 0.8664 - 2cls_accuracy: 0.9158 - 1rg_mse: 26.4515

Now, we can test the multi-input model and get multi-out from it.

pred10, pred2, pred1 = encoder.predict(
    [
         tf.expand_dims(xtrain[0], 0),
         tf.expand_dims(xtrain[0], 0)
    ]
)

# regression part 
pred1
array([[25.13295]], dtype=float32)

# even or odd 
pred2.argmax()
0

# what digit 
pred10.argmax()
5