python keras conv-neural-network face-recognition siamese-network

How to Implement Siamese Network using pretrained CNNs in Keras?

I am developing a Siamese Network for Face Recognition using Keras for 224x224x3 sized images. The architecture of a Siamese Network is like this:

For the CNN model, I am thinking of using the InceptionV3 model which is already pretrained in the Keras.applications module.

#Assume all the other modules are imported correctly

from keras.applications.inception_v3 import InceptionV3

IMG_SHAPE=(224,224,3)

def return_siamese_net():

  left_input=Input(IMG_SHAPE)
  right_input=Input(IMG_SHAPE)

  model1=InceptionV3(include_top=False, weights="imagenet", input_tensor=left_input) #Left SubConvNet
  model2=InceptionV3(include_top=False, weights="imagenet", input_tensor=right_input) #Right SubConvNet

  #Do Something here

  distance_layer = #Do Something
  prediction = Dense(1,activation='sigmoid')(distance_layer) # Outputs 1 if the images match and 0 if it does not

  siamese_net = #Do Something  
  return siamese_net

model=return_siamese_net()

I get error since the model is pretrained, and I am now stuck at implementing the Distance Layer for the Twin Network.

What should I add in between to make this Siamese Network work?

Solution

A very important note, before you use the distance layer, is to take into consideration that you have only one convolutional neural network.

The shared weights actually refer to only one convolutional neural network, and the weights are shared because the same weights are used when passing a pair of images (depending on the loss function used) in order to compute the features and subsequently the embeddings of each input image.

You would have only one neural network, and the block logic will need to look like:

def euclidean_distance(vectors):
    (features_A, features_B) = vectors
    sum_squared = K.sum(K.square(features_A - features_B), axis=1, keepdims=True)
    return K.sqrt(K.maximum(sum_squared, K.epsilon()))


image_A = Input(shape=...)
image_B = Input(shape=...)
feature_extractor_model = get_feature_extractor_model(shape=...)
features_A = feature_extractor(image_A)
features_B = feature_extractor(image_B)
distance = Lambda(euclidean_distance)([features_A, features_B])
outputs = Dense(1, activation="sigmoid")(distance)
siamese_model = Model(inputs=[image_A, image_B], outputs=outputs)

Of course, the feature extractor model can be a pretrained network from Keras/TensorFlow, with the output classification layer improved.

The main logic should be like the one above, of course, if you want to use triplet loss, that would require three inputs (Anchor, Positive, Negative), but for the beginning I would recommend to stick to the basics.

Also, it would a good idea to consult this documentation: