I am little interested in sequence tagging for NER. I follow the code "https://github.com/monikkinom/ner-lstm/blob/master/model.py" to make my model like below:
X = tf.placeholder(tf.float32, shape=[None, timesteps , num_input])
Y = tf.placeholder("float", [None, timesteps, num_classes])
y_true = tf.reshape(tf.stack(Y), [-1, num_classes])
the input is,
X: (batch_size,max_sent_length,word_embed_dim)
and
Y: (batch_size,max_sent_length,number_of_labels)
Then I pass the value to a Bi-direction LSTM unit:
def BiRNN(x):
x=tf.unstack(tf.transpose(x, perm=[1, 0, 2]))
def rnn_cell():
cell = tf.nn.rnn_cell.LSTMCell(rnn_size, forget_bias=1,state_is_tuple=True)
return cell
fw_cell=rnn_cell()
bw_cell=rnn_cell()
output,_, _ = tf.nn.static_bidirectional_rnn(fw_cell, bw_cell,x, dtype=tf.float32)
weight, bias = weight_and_bias(2 * rnn_size, num_classes)
output = tf.reshape(tf.transpose(tf.stack(output), perm=[1, 0, 2]), [-1, 2 * rnn_size])
return (tf.matmul(output, weight) + bias)
Where, rnn_size = 128
Then I am doing the below calculations:
logits = BiRNN(X)
logits = tf.reshape(tf.stack(logits), [-1, timesteps,num_classes])
prediction = tf.reshape(logits, [-1, num_classes])
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=y_true))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
train_op = optimizer.minimize(cost)
I took, batch_size = 64 and 30 epochs.
But in my model only one label is detected every time. I am not able to point out the problem in my code. Please help.
Please check the dimensions of the tensors y_true, output(both the places), logits and prediction and check whether it comes as per your expectation.