tensorflowmachine-learninglstm

Generate text with a trained character level LSTM model


I trained a model with the purpose of generating sentences as follow: I feed as training example 2 sequences: x which is a sequence of characters and y which is the same shift by one. The model is based on LSTM and is created with tensorflow.
My question is: since the model take in input sequences of a certain size (50 in my case), how can I make prediction giving him only a single character as seed ? I've seen it in some examples that after training they generate sentences by simply feeding a single characters.
Here is my code:

    with tf.name_scope('input'):
        x = tf.placeholder(tf.float32, [batch_size, truncated_backprop], name='x')
        y = tf.placeholder(tf.int32, [batch_size, truncated_backprop], name='y')

    with tf.name_scope('weights'):
        W = tf.Variable(np.random.rand(n_hidden, num_classes), dtype=tf.float32)
        b = tf.Variable(np.random.rand(1, num_classes), dtype=tf.float32)

    inputs_series = tf.split(x, truncated_backprop, 1)
    labels_series = tf.unstack(y, axis=1)

    with tf.name_scope('LSTM'):
        cell = tf.contrib.rnn.BasicLSTMCell(n_hidden, state_is_tuple=True)
        cell = tf.contrib.rnn.DropoutWrapper(cell, output_keep_prob=dropout)
        cell = tf.contrib.rnn.MultiRNNCell([cell] * n_layers)

    states_series, current_state = tf.contrib.rnn.static_rnn(cell, inputs_series, \
        dtype=tf.float32)

    logits_series = [tf.matmul(state, W) + b for state in states_series]
    prediction_series = [tf.nn.softmax(logits) for logits in logits_series]

    losses = [tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=labels) \
        for logits, labels, in zip(logits_series, labels_series)]
    total_loss = tf.reduce_mean(losses)

    train_step = tf.train.AdamOptimizer(learning_rate).minimize(total_loss)

Solution

  • I suggest you use dynamic_rnn instead of static_rnn, which creates the graph during execution time and allows you to have inputs of any length. Your input placeholder would be

    x = tf.placeholder(tf.float32, [batch_size, None, features], name='x')
    

    Next, you'll need a way to input your own initial state into the network. You can do that by passing the initial_state parameter to dynamic_rnn, like:

    initialstate = cell.zero_state(batch_sie, tf.float32)
    outputs, current_state = tf.nn.dynamic_rnn(cell,
                                               inputs,
                                               initial_state=initialstate)
    

    With that, in order to generate text from a single character you can feed the graph 1 character at a time, passing in the previous character and state each time, like:

    prompt = 's' # beginning character, whatever
    inp = one_hot(prompt) # preprocessing, as you probably want to feed one-hot vectors
    state = None
    while True:
        if state is None:
            feed = {x: [[inp]]}
        else:
            feed = {x: [[inp]], initialstate: state}
    
        out, state = sess.run([outputs, current_state], feed_dict=feed)
    
        inp = process(out) # extract the predicted character from out and one-hot it