I am coming from this tutorial, which uses a multinomial distribution in eager execution to get a final prediction for the next character for text generation, based on a predictions tensor coming from our RNN.
# using a multinomial distribution to predict the character returned by the model
temperature = 0.5
predictions = predictions / temperature
predicted_id = tf.multinomial(predictions, num_samples=1)[-1,0].numpy()
My questions are:
Isn't temperature (here 0.5) not just scaling all predictions, why does it influence the multinomial selection then?
[0.2, 0.4, 0.3, 0.1]/temperature = [0.4, 0.8, 0.6, 0.2]
So isn't the multinomial normalizing the probabilities? And thus when scaling we just increase the probability for each character with a limit at 1?
What does [-1, 0].numpy() do? I am completely lost with this one.
Any hints are appreciated.
Thus, the smaller the probability in the first place the smaller it becomes for temperatures smaller than 1. And the larger for temperatures lager than 1:
math.exp(0.4)/math.exp(0.8) = 0.670
math.exp(0.3)/ math.exp(0.6) = 0.7408
math.exp(0.2)/ math.exp(0.4) = 0.818
math.exp(0.1)/ math.exp(0.2) = 0.9048
[-1, 0].numpy()
just gets the value of the multinomial tensorsuch as:
tf.multinomial(predictions, num_samples=1)
tf.Tensor([[3]], shape=(1, 1), dtype=int64)
to 3