pythontensorflowlstmsequence-to-sequencenamed-entity-recognition

sequence tagging task in tensorflow using bidirectional lstm


I am little interested in sequence tagging for NER. I follow the code "https://github.com/monikkinom/ner-lstm/blob/master/model.py" to make my model like below:

X = tf.placeholder(tf.float32, shape=[None, timesteps , num_input])
Y = tf.placeholder("float", [None, timesteps, num_classes])
y_true = tf.reshape(tf.stack(Y), [-1, num_classes])

the input is,
X: (batch_size,max_sent_length,word_embed_dim)
and
Y: (batch_size,max_sent_length,number_of_labels)

Then I pass the value to a Bi-direction LSTM unit:

def BiRNN(x):
    x=tf.unstack(tf.transpose(x, perm=[1, 0, 2]))

    def rnn_cell():
        cell = tf.nn.rnn_cell.LSTMCell(rnn_size, forget_bias=1,state_is_tuple=True)
        return cell

    fw_cell=rnn_cell()
    bw_cell=rnn_cell()
    output,_, _ = tf.nn.static_bidirectional_rnn(fw_cell, bw_cell,x, dtype=tf.float32)
    weight, bias = weight_and_bias(2 * rnn_size, num_classes)
    output = tf.reshape(tf.transpose(tf.stack(output), perm=[1, 0, 2]), [-1, 2 * rnn_size])
return (tf.matmul(output, weight) + bias)

Where, rnn_size = 128

Then I am doing the below calculations:

logits = BiRNN(X)
logits = tf.reshape(tf.stack(logits), [-1, timesteps,num_classes])
prediction = tf.reshape(logits, [-1, num_classes])
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=y_true))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
train_op = optimizer.minimize(cost)

I took, batch_size = 64 and 30 epochs.
But in my model only one label is detected every time. I am not able to point out the problem in my code. Please help.


Solution

  • Please check the dimensions of the tensors y_true, output(both the places), logits and prediction and check whether it comes as per your expectation.