How to build a character-level siamise network using Keras

I am trying to build a Siamese neural network on characters-level using Keras, to learn if two names are similars or not.

So my two inputs X1 and X2 are a 3-D matrices:
X[number_of_cases, max_length_of_name, total_number_of_chars_in_DB]

In the real case:

number_of_cases = 5000
max_length_of_name = 50
total_number_of_chars_in_DB = 38

I have one output binary matrix of size y[number_of_cases].

So for example: print(X1[:3, :2])

Will give the following result:

[[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
   0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
   0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]

 [[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
   0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
   0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]

 [[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.
   0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.
   0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]]

I use the following code to build my model:

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM, SimpleRNN
from keras.models import Model
import keras
from keras import backend as k

input_1 = Input(shape=(X1.shape[1], X1.shape[2],))
input_2 = Input(shape=(X2.shape[1], X2.shape[2],))

lstm1 = Bidirectional(LSTM(256, input_shape=(X1.shape[1], X1.shape[2],), return_sequences=False))
lstm2 = Bidirectional(LSTM(256, input_shape=(X1.shape[1], X1.shape[2],), return_sequences=False))

l1_norm = lambda x: 1 - k.abs(x[0] - x[1])

merged = Lambda(function=l1_norm, output_shape=lambda x: x[0], name='L1_distance')([lstm1, lstm2])

predictions = Dense(1, activation = 'sigmoid', name='classification_layer')(merged)

model = Model([input_1, input_2], predictions)
model.compile(loss = 'binary_crossentropy', optimizer="adam", metrics=["accuracy"])

model.fit([X1, X2], validation_split=0.1, epochs = 20,shuffle=True, batch_size = 256)

I am getting the following error:

Layer L1_distance was called with an input that isn't a symbolic tensor.

I think that the error is that I need to tell the L1_distance layer to use the output of the two precedent LSTM layers, but I do not know how to do it.

The second question, is, am I obliged to add an embedding layer before the LSTM, even in the scenario of character level network?

Thank you.

Solution

Your model inputs were [input_1, input_2] and outputs were predictions. But input_1 and input_2 were not connected to lstm1 and lstm2, so the input layers of the model was not connected to the output layer, that's why you are getting the error.

Try this instead:

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM, SimpleRNN
from keras.models import Model
import keras
from keras import backend as k

input_1 = Input(shape=(X1.shape[1], X1.shape[2],))
input_2 = Input(shape=(X2.shape[1], X2.shape[2],))

lstm1 = Bidirectional(LSTM(256, return_sequences=False))(input_1)
lstm2 = Bidirectional(LSTM(256, return_sequences=False))(input_2)

l1_norm = lambda x: 1 - k.abs(x[0] - x[1])

merged = Lambda(function=l1_norm, output_shape=lambda x: x[0], name='L1_distance')([lstm1, lstm2])

predictions = Dense(1, activation = 'sigmoid', name='classification_layer')(merged)

model = Model([input_1, input_2], predictions)
model.compile(loss = 'binary_crossentropy', optimizer="adam", metrics=["accuracy"])

model.fit([X1, X2], validation_split=0.1, epochs = 20,shuffle=True, batch_size = 256)