I am trying to build a Siamese neural network on characters-level using Keras, to learn if two names are similars or not.
So my two inputs X1 and X2 are a 3-D matrices:
X[number_of_cases, max_length_of_name, total_number_of_chars_in_DB]
In the real case:
I have one output binary matrix of size y[number_of_cases].
So for example:
print(X1[:3, :2])
Will give the following result:
[[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]]
I use the following code to build my model:
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM, SimpleRNN
from keras.models import Model
import keras
from keras import backend as k
input_1 = Input(shape=(X1.shape[1], X1.shape[2],))
input_2 = Input(shape=(X2.shape[1], X2.shape[2],))
lstm1 = Bidirectional(LSTM(256, input_shape=(X1.shape[1], X1.shape[2],), return_sequences=False))
lstm2 = Bidirectional(LSTM(256, input_shape=(X1.shape[1], X1.shape[2],), return_sequences=False))
l1_norm = lambda x: 1 - k.abs(x[0] - x[1])
merged = Lambda(function=l1_norm, output_shape=lambda x: x[0], name='L1_distance')([lstm1, lstm2])
predictions = Dense(1, activation = 'sigmoid', name='classification_layer')(merged)
model = Model([input_1, input_2], predictions)
model.compile(loss = 'binary_crossentropy', optimizer="adam", metrics=["accuracy"])
model.fit([X1, X2], validation_split=0.1, epochs = 20,shuffle=True, batch_size = 256)
I am getting the following error:
Layer L1_distance was called with an input that isn't a symbolic tensor.
I think that the error is that I need to tell the L1_distance layer to use the output of the two precedent LSTM layers, but I do not know how to do it.
The second question, is, am I obliged to add an embedding layer before the LSTM, even in the scenario of character level network?
Thank you.
Your model inputs were [input_1, input_2]
and outputs were predictions
. But input_1
and input_2
were not connected to lstm1
and lstm2
, so the input layers of the model was not connected to the output layer, that's why you are getting the error.
Try this instead:
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM, SimpleRNN
from keras.models import Model
import keras
from keras import backend as k
input_1 = Input(shape=(X1.shape[1], X1.shape[2],))
input_2 = Input(shape=(X2.shape[1], X2.shape[2],))
lstm1 = Bidirectional(LSTM(256, return_sequences=False))(input_1)
lstm2 = Bidirectional(LSTM(256, return_sequences=False))(input_2)
l1_norm = lambda x: 1 - k.abs(x[0] - x[1])
merged = Lambda(function=l1_norm, output_shape=lambda x: x[0], name='L1_distance')([lstm1, lstm2])
predictions = Dense(1, activation = 'sigmoid', name='classification_layer')(merged)
model = Model([input_1, input_2], predictions)
model.compile(loss = 'binary_crossentropy', optimizer="adam", metrics=["accuracy"])
model.fit([X1, X2], validation_split=0.1, epochs = 20,shuffle=True, batch_size = 256)