pythontensorflowkerasdeep-learningfunctional-api

Multi-input Multi-output Model with Keras Functional API


As described in figure 1, I have 3 models which each apply to a particular domain.

The 3 models are trained separately with different datasets. enter image description here

And inference is sequential :

enter image description here

I tried to parallelize the call of these 3 models thanks to the Multiprocess library of python but it is very unstable and it is not advised.

Here's the idea I got to make sure to do this all at once:

As the 3 models share a common pretrained-model, I want to make a single model that has multiple inputs and multiple outputs.

As the following drawing shows: enter image description here

Like that during the inference, I will call a single model which will do all 3 operations at the same time.

enter image description here

I saw that with The Functional API of KERAS, it is possible but I have no idea how to do that. The inputs of the datasets have the same dimension. These are pictures of (200,200,3).

If anyone has an example of a Multi-Input Multi-output model that shares a common structure, I'm all ok.

UPADE

Here is the example of my code but it returns an error because of the layers. concatenate (...) line which propagates a shape that is not taken into account by the EfficientNet model.

age_inputs = layers.Input(shape=(IMG_SIZE, IMG_SIZE, 3), name="age_inputs")
    
gender_inputs = layers.Input(shape=(IMG_SIZE, IMG_SIZE, 3)
                               , name="gender_inputs")
    
emotion_inputs = layers.Input(shape=(IMG_SIZE, IMG_SIZE, 3), 
                                name="emotion_inputs")


inputs = layers.concatenate([age_inputs, gender_inputs, emotion_inputs])
inputs = layers.Conv2D(3, (3, 3), activation="relu")(inputs)    
model = EfficientNetB0(include_top=False, 
                   input_tensor=inputs, weights="imagenet")
    

model.trainable = False

inputs = layers.GlobalAveragePooling2D(name="avg_pool")(model.output)
inputs = layers.BatchNormalization()(inputs)

top_dropout_rate = 0.2
inputs = layers.Dropout(top_dropout_rate, name="top_dropout")(inputs)

age_outputs = layers.Dense(1, activation="linear", 
                          name="age_pred")(inputs)
gender_outputs = layers.Dense(GENDER_NUM_CLASSES, 
                              activation="softmax", 
                              name="gender_pred")(inputs)
emotion_outputs = layers.Dense(EMOTION_NUM_CLASSES, activation="softmax", 
                             name="emotion_pred")(inputs)

model = keras.Model(inputs=[age_inputs, gender_inputs, emotion_inputs], 
              outputs =[age_outputs, gender_outputs, emotion_outputs], 
              name="EfficientNet")

optimizer = keras.optimizers.Adam(learning_rate=1e-2)
model.compile(loss={"age_pred" : "mse", 
                   "gender_pred":"categorical_crossentropy", 
                    "emotion_pred":"categorical_crossentropy"}, 
                   optimizer=optimizer, metrics=["accuracy"])

(age_train_images, age_train_labels), (age_test_images, age_test_labels) = reg_data_loader.load_data(...)
(gender_train_images, gender_train_labels), (gender_test_images, gender_test_labels) = cat_data_loader.load_data(...)
(emotion_train_images, emotion_train_labels), (emotion_test_images, emotion_test_labels) = cat_data_loader.load_data(...)

 model.fit({'age_inputs':age_train_images, 'gender_inputs':gender_train_images, 'emotion_inputs':emotion_train_images},
         {'age_pred':age_train_labels, 'gender_pred':gender_train_labels, 'emotion_pred':emotion_train_labels},
                 validation_split=0.2, 
                       epochs=5, 
                            batch_size=16)

Solution

  • We can do that easily in tf. keras using its awesome Functional API. Here we will walk you through how to build multi-out with a different type (classification and regression) using Functional API.

    According to your last diagram, you need one input model and three outputs of different types. To demonstrate, we will use MNIST which is a handwritten dataset. It's normally a 10 class classification problem data set. From it, we will create an additionally 2 class classifier (whether a digit is even or odd) and also a 1 regression part (which is to predict the square of a digit, i.e for image input of 9, it should give approximately it's square).


    Data Set

    import numpy as np 
    import tensorflow as tf
    from tensorflow import keras
    from tensorflow.keras import layers
    
    (xtrain, ytrain), (_, _) = keras.datasets.mnist.load_data()
    
    # 10 class classifier 
    y_out_a = keras.utils.to_categorical(ytrain, num_classes=10) 
    
    # 2 class classifier, even or odd 
    y_out_b = keras.utils.to_categorical((ytrain % 2 == 0).astype(int), num_classes=2) 
    
    # regression, predict square of an input digit image
    y_out_c = tf.square(tf.cast(ytrain, tf.float32))
    

    So, our training pairs will be xtrain and [y_out_a, y_out_b, y_out_c], the same as your last diagram.


    Model Building

    Let's build the model accordingly using the Functional API of tf. keras. See the model definition below. The MNIST samples are a 28 x 28 grayscale image. So our input is set in that way. I'm guessing your data set is probably RGB, so change the input dimension accordingly.

    input = keras.Input(shape=(28, 28, 1), name="original_img")
    x = layers.Conv2D(16, 3, activation="relu")(input)
    x = layers.Conv2D(32, 3, activation="relu")(x)
    x = layers.MaxPooling2D(3)(x)
    x = layers.Conv2D(32, 3, activation="relu")(x)
    x = layers.Conv2D(16, 3, activation="relu")(x)
    x = layers.GlobalMaxPooling2D()(x)
    
    out_a = keras.layers.Dense(10, activation='softmax', name='10cls')(x)
    out_b = keras.layers.Dense(2, activation='softmax', name='2cls')(x)
    out_c = keras.layers.Dense(1, activation='linear', name='1rg')(x)
    
    encoder = keras.Model( inputs = input, outputs = [out_a, out_b, out_c], name="encoder")
    
    # Let's plot 
    keras.utils.plot_model(
        encoder
    )
    

    enter image description here

    One thing to note, while defining out_a, out_b, and out_c during model definition we set their name variable which is very important. Their names are set '10cls', '2cls', and '1rg' respectively. You can also see this from the above diagram (last 3 tails).


    Compile and Run

    Now, we can see why that name variable is important. In order to run the model, we need to compile it first with the proper loss function, metrics, and optimizer. Now, if you know that, for the classification and regression problem, the optimizer can be the same but for the loss function and metrics should be changed. And in our model, which has a multi-type output model (2 classifications and 1 regression), we need to set proper loss and metrics for each of these types. Please, see below how it's done.

    encoder.compile(
        loss = {
            "10cls": tf.keras.losses.CategoricalCrossentropy(),
            "2cls": tf.keras.losses.CategoricalCrossentropy(),
            "1rg": tf.keras.losses.MeanSquaredError()
        },
    
        metrics = {
            "10cls": 'accuracy',
            "2cls": 'accuracy',
            "1rg": 'mse'
        },
    
        optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
    )
    

    See, each last output of our above model, which is here represented by their name variables. And we set proper compilation to them. Hope you understand this part. Now, time to train the model.

    encoder.fit(xtrain, [y_out_a, y_out_b, y_out_c], epochs=30, verbose=2)
    
    Epoch 1/30
    1875/1875 - 6s - loss: 117.7318 - 10cls_loss: 3.2642 - 4cls_loss: 0.9040 - 1rg_loss: 113.5637 - 10cls_accuracy: 0.6057 - 4cls_accuracy: 0.8671 - 1rg_mse: 113.5637
    Epoch 2/30
    1875/1875 - 5s - loss: 62.1696 - 10cls_loss: 0.5151 - 4cls_loss: 0.2437 - 1rg_loss: 61.4109 - 10cls_accuracy: 0.8845 - 4cls_accuracy: 0.9480 - 1rg_mse: 61.4109
    Epoch 3/30
    1875/1875 - 5s - loss: 50.3159 - 10cls_loss: 0.2804 - 4cls_loss: 0.1371 - 1rg_loss: 49.8985 - 10cls_accuracy: 0.9295 - 4cls_accuracy: 0.9641 - 1rg_mse: 49.8985
    
    
    Epoch 28/30
    1875/1875 - 5s - loss: 15.5841 - 10cls_loss: 0.1066 - 4cls_loss: 0.0891 - 1rg_loss: 15.3884 - 10cls_accuracy: 0.9726 - 4cls_accuracy: 0.9715 - 1rg_mse: 15.3884
    Epoch 29/30
    1875/1875 - 5s - loss: 15.2199 - 10cls_loss: 0.1058 - 4cls_loss: 0.0859 - 1rg_loss: 15.0281 - 10cls_accuracy: 0.9736 - 4cls_accuracy: 0.9727 - 1rg_mse: 15.0281
    Epoch 30/30
    1875/1875 - 5s - loss: 15.2178 - 10cls_loss: 0.1136 - 4cls_loss: 0.0854 - 1rg_loss: 15.0188 - 10cls_accuracy: 0.9722 - 4cls_accuracy: 0.9736 - 1rg_mse: 15.0188
    <tensorflow.python.keras.callbacks.History at 0x7ff42c18e110>
    

    That's how each of the outputs of the last layer optimizes by their concern loss function. FYI, one thing to mention, there is an essential parameter while .compile the model which you might need: loss_weights - to weight the loss contributions of different model outputs. See my other answer here on this.


    Prediction / Inference

    Let's see some output. We now hope this model will predict 3 things: (1) is what the digit is, (2) is it even or odd, and (3) its square value.

    import matplotlib.pyplot as plt
    plt.imshow(xtrain[0])
    

    enter image description here

    If we like to quickly check the output layers of our model

    encoder.output
    
    [<KerasTensor: shape=(None, 10) dtype=float32 (created by layer '10cls')>,
     <KerasTensor: shape=(None, 2) dtype=float32 (created by layer '4cls')>,
     <KerasTensor: shape=(None, 1) dtype=float32 (created by layer '1rg')>]
    

    Passing this xtrain[0] (which we know 5) to the model to do predictions.

    # we expand for a batch dimension: (1, 28, 28, 1)
    pred10, pred2, pred1 = encoder.predict(tf.expand_dims(xtrain[0], 0))
    
    # regression: square of the input dgit image 
    pred1 
    array([[22.098022]], dtype=float32)
    
    # even or odd, surely odd 
    pred2.argmax()
    0
    
    # which number, surely 5
    pred10.argmax()
    5
    

    Update

    Based on your comment, we can extend the above model to take multi-input too. We need to change things. To demonstrate, we will use train and test samples of the mnist data set to the model as a multi-input.

    (xtrain, ytrain), (xtest, _) = keras.datasets.mnist.load_data()
    
    xtrain = xtrain[:10000] # both input sample should be same number 
    ytrain = ytrain[:10000] # both input sample should be same number
    
    y_out_a = keras.utils.to_categorical(ytrain, num_classes=10)
    y_out_b = keras.utils.to_categorical((ytrain % 2 == 0).astype(int), num_classes=2)
    y_out_c = tf.square(tf.cast(ytrain, tf.float32))
    
    print(xtrain.shape, xtest.shape) 
    print(y_out_a.shape, y_out_b.shape, y_out_c.shape)
    # (10000, 28, 28) (10000, 28, 28)
    # (10000, 10) (10000, 2) (10000,)
    

    Next, we need to modify some parts of the above model to take multi-input. And next if you now plot, you will see the new graph.

    input0 = keras.Input(shape=(28, 28, 1), name="img2")
    input1 = keras.Input(shape=(28, 28, 1), name="img1")
    concate_input = layers.Concatenate()([input0, input1])
    
    x = layers.Conv2D(16, 3, activation="relu")(concate_input)
    ...
    ...
    ...
    # multi-input , multi-output
    encoder = keras.Model( inputs = [input0, input1], 
                           outputs = [out_a, out_b, out_c], name="encoder")
    

    enter image description here

    Now, we can train the model as follows

    # multi-input, multi-output
    encoder.fit([xtrain, xtest], [y_out_a, y_out_b, y_out_c], 
                 epochs=30, batch_size = 256, verbose=2)
    
    Epoch 1/30
    40/40 - 1s - loss: 66.9731 - 10cls_loss: 0.9619 - 2cls_loss: 0.4412 - 1rg_loss: 65.5699 - 10cls_accuracy: 0.7627 - 2cls_accuracy: 0.8815 - 1rg_mse: 65.5699
    Epoch 2/30
    40/40 - 0s - loss: 60.5408 - 10cls_loss: 0.8959 - 2cls_loss: 0.3850 - 1rg_loss: 59.2598 - 10cls_accuracy: 0.7794 - 2cls_accuracy: 0.8928 - 1rg_mse: 59.2598
    Epoch 3/30
    40/40 - 0s - loss: 57.3067 - 10cls_loss: 0.8586 - 2cls_loss: 0.3669 - 1rg_loss: 56.0813 - 10cls_accuracy: 0.7856 - 2cls_accuracy: 0.8951 - 1rg_mse: 56.0813
    ...
    ...
    Epoch 28/30
    40/40 - 0s - loss: 29.1198 - 10cls_loss: 0.4775 - 2cls_loss: 0.2573 - 1rg_loss: 28.3849 - 10cls_accuracy: 0.8616 - 2cls_accuracy: 0.9131 - 1rg_mse: 28.3849
    Epoch 29/30
    40/40 - 0s - loss: 27.5318 - 10cls_loss: 0.4696 - 2cls_loss: 0.2518 - 1rg_loss: 26.8104 - 10cls_accuracy: 0.8645 - 2cls_accuracy: 0.9142 - 1rg_mse: 26.8104
    Epoch 30/30
    40/40 - 0s - loss: 27.1581 - 10cls_loss: 0.4620 - 2cls_loss: 0.2446 - 1rg_loss: 26.4515 - 10cls_accuracy: 0.8664 - 2cls_accuracy: 0.9158 - 1rg_mse: 26.4515
    

    Now, we can test the multi-input model and get multi-out from it.

    pred10, pred2, pred1 = encoder.predict(
        [
             tf.expand_dims(xtrain[0], 0),
             tf.expand_dims(xtrain[0], 0)
        ]
    )
    
    # regression part 
    pred1
    array([[25.13295]], dtype=float32)
    
    # even or odd 
    pred2.argmax()
    0
    
    # what digit 
    pred10.argmax()
    5