pythontensorflowkerasconv-neural-network

How to Build a Two-Branch Keras Model with Dense and Conv2D Layers?


This is a simple example that reproduces my issue in a network I am trying to deploy.

I have an image input layer (which I need to maintain), then a Dense layer, Conv2D layer and a dense layer.

The idea is that the inputs are 10x10 images and the labels are 10x10 images. Inspired by my code and this example.

import numpy as np
from keras.models import Model
from keras.layers import Input, Conv2D

#Building model
size=10
a = Input(shape=(size,size,1))
hidden = Dense(size)(a)
hidden = Conv2D(kernel_size = (3,3), filters = size*size, activation='relu', padding='same')(hidden)
outputs = Dense(size, activation='sigmoid')(hidden)

model = Model(inputs=a, outputs=outputs)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

#Create random data and accounting for 1 channel of data
n_images=55
data = np.random.randint(0,2,(n_images,size,size,1))
labels = np.random.randint(0,2,(n_images,size,size,1))

#Fit model
model.fit(data, labels, verbose=1, batch_size=10, epochs=20)

print(model.summary())

I get the following error: ValueError: Error when checking target: expected dense_92 to have shape (10, 10, 10) but got array with shape (10, 10, 1)


I don't get an error if I change:

outputs = Dense(size, activation='sigmoid')(hidden)

with:

outputs = Dense(1, activation='sigmoid')(hidden)

No idea how Dense(1) is even valid and how it allows 10x10 output signal as model.summary() indicates:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_26 (InputLayer)        (None, 10, 10, 1)         0         
_________________________________________________________________
dense_93 (Dense)             (None, 10, 10, 10)        20        
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 10, 10, 100)       9100      
_________________________________________________________________
dense_94 (Dense)             (None, 10, 10, 1)         101       
=================================================================
Total params: 9,221
Trainable params: 9,221
Non-trainable params: 0
_________________________________________________________________
None

Edit (moved from old comments):

  1. what I am trying to do isn't standard. I have set of images and for each image I want to find a binary image of the same size that if the value of its pixel is 1 it means the feature exists in the input image

  2. the insight wether a pixel has a feature should be taken both from local information (extracted by a convolution layers) and global information extracted by Dense layers.


Solution

  • Well, according to your comments:

    what I am trying to do isn't standard. I have set of images and for each image I want to find a binary image of the same size that if the value of its pixel is 1 it means the feature exists in the input image

    the insight wether a pixel has a feature should be taken both from local information (extracted by a convolution layers) and global information extracted by Dense layers.

    I guess you are looking for creating a two branch model where one branch consists of convolution layers and another one is simply one or more dense layers on top of each other (although, I should mention that in my opinion one convolution network may achieve what you are looking for, because the combination of pooling and convolution layers and then maybe some up-sampling layers at the end somehow preserves both local and global information). To define such a model, you can use Keras functional API like this:

    from keras import models
    from keras import layers
    
    input_image = layers.Input(shape=(10, 10, 1))
    
    # branch one: dense layers
    b1 = layers.Flatten()(input_image)
    b1 = layers.Dense(64, activation='relu')(b1)
    b1_out = layers.Dense(32, activation='relu')(b1)
    
    # branch two: conv + pooling layers
    b2 = layers.Conv2D(32, (3,3), activation='relu')(input_image)
    b2 = layers.MaxPooling2D((2,2))(b2)
    b2 = layers.Conv2D(64, (3,3), activation='relu')(b2)
    b2_out = layers.MaxPooling2D((2,2))(b2)
    
    # merge two branches
    flattened_b2 = layers.Flatten()(b2_out)
    merged = layers.concatenate([b1_out, flattened_b2])
    
    # add a final dense layer
    output = layers.Dense(10*10, activation='sigmoid')(merged)
    output = layers.Reshape((10,10))(output)
    
    # create the model
    model = models.Model(input_image, output)
    
    model.compile(optimizer='rmsprop', loss='binary_crossentropy')
    model.summary()
    

    Model summary:

    __________________________________________________________________________________________________
    Layer (type)                    Output Shape         Param #     Connected to                     
    ==================================================================================================
    input_1 (InputLayer)            (None, 10, 10, 1)    0                                            
    __________________________________________________________________________________________________
    conv2d_1 (Conv2D)               (None, 8, 8, 32)     320         input_1[0][0]                    
    __________________________________________________________________________________________________
    max_pooling2d_1 (MaxPooling2D)  (None, 4, 4, 32)     0           conv2d_1[0][0]                   
    __________________________________________________________________________________________________
    flatten_1 (Flatten)             (None, 100)          0           input_1[0][0]                    
    __________________________________________________________________________________________________
    conv2d_2 (Conv2D)               (None, 2, 2, 64)     18496       max_pooling2d_1[0][0]            
    __________________________________________________________________________________________________
    dense_1 (Dense)                 (None, 64)           6464        flatten_1[0][0]                  
    __________________________________________________________________________________________________
    max_pooling2d_2 (MaxPooling2D)  (None, 1, 1, 64)     0           conv2d_2[0][0]                   
    __________________________________________________________________________________________________
    dense_2 (Dense)                 (None, 32)           2080        dense_1[0][0]                    
    __________________________________________________________________________________________________
    flatten_2 (Flatten)             (None, 64)           0           max_pooling2d_2[0][0]            
    __________________________________________________________________________________________________
    concatenate_1 (Concatenate)     (None, 96)           0           dense_2[0][0]                    
                                                                     flatten_2[0][0]                  
    __________________________________________________________________________________________________
    dense_3 (Dense)                 (None, 100)          9700        concatenate_1[0][0]              
    __________________________________________________________________________________________________
    reshape_1 (Reshape)             (None, 10, 10)       0           dense_3[0][0]                    
    ==================================================================================================
    Total params: 37,060
    Trainable params: 37,060
    Non-trainable params: 0
    __________________________________________________________________________________________________
    

    Note that this is one way of achieving what you are looking for and it may or may not work for the specific problem and the data you are working on. You may modify this model (e.g. remove the pooling layers or add more dense layers) or completely use another architecture with different kind of layers (e.g. up-sampling, conv2dtrans) to reach a better accuracy. At the end, you must experiment to find the perfect solution.

    Edit:

    For completeness here is how to generate data and fitting the network:

    n_images=10
    data = np.random.randint(0,2,(n_images,size,size,1))
    labels = np.random.randint(0,2,(n_images,size,size,1))
    model.fit(data, labels, verbose=1, batch_size=32, epochs=20)