I am developing a Siamese Network for Face Recognition using Keras for 224x224x3 sized images. The architecture of a Siamese Network is like this:
For the CNN model, I am thinking of using the InceptionV3 model which is already pretrained in the Keras.applications module.
#Assume all the other modules are imported correctly
from keras.applications.inception_v3 import InceptionV3
IMG_SHAPE=(224,224,3)
def return_siamese_net():
left_input=Input(IMG_SHAPE)
right_input=Input(IMG_SHAPE)
model1=InceptionV3(include_top=False, weights="imagenet", input_tensor=left_input) #Left SubConvNet
model2=InceptionV3(include_top=False, weights="imagenet", input_tensor=right_input) #Right SubConvNet
#Do Something here
distance_layer = #Do Something
prediction = Dense(1,activation='sigmoid')(distance_layer) # Outputs 1 if the images match and 0 if it does not
siamese_net = #Do Something
return siamese_net
model=return_siamese_net()
I get error since the model is pretrained, and I am now stuck at implementing the Distance Layer for the Twin Network.
What should I add in between to make this Siamese Network work?
A very important note, before you use the distance layer, is to take into consideration that you have only one convolutional neural network.
The shared weights actually refer to only one convolutional neural network, and the weights are shared because the same weights are used when passing a pair of images (depending on the loss function used) in order to compute the features and subsequently the embeddings of each input image.
You would have only one neural network, and the block logic will need to look like:
def euclidean_distance(vectors):
(features_A, features_B) = vectors
sum_squared = K.sum(K.square(features_A - features_B), axis=1, keepdims=True)
return K.sqrt(K.maximum(sum_squared, K.epsilon()))
image_A = Input(shape=...)
image_B = Input(shape=...)
feature_extractor_model = get_feature_extractor_model(shape=...)
features_A = feature_extractor(image_A)
features_B = feature_extractor(image_B)
distance = Lambda(euclidean_distance)([features_A, features_B])
outputs = Dense(1, activation="sigmoid")(distance)
siamese_model = Model(inputs=[image_A, image_B], outputs=outputs)
Of course, the feature extractor model can be a pretrained network from Keras/TensorFlow, with the output classification layer improved.
The main logic should be like the one above, of course, if you want to use triplet loss, that would require three inputs (Anchor, Positive, Negative), but for the beginning I would recommend to stick to the basics.
Also, it would a good idea to consult this documentation: