I want to manually choose my tokens by myself, instead of letting llama-cpp-python
automatically choose one for me.
This requires me to see a list of candidate next tokens, along their probabilities, so that I pick the right one as per my criteria.
How to do this?
You need to create model with logits_all=True
model = Llama(model_path="your model here", logits_all=True)
Then request completion with one max token and the number of logprobs you need
out = model.create_completion("The capital of France is", max_tokens=1, logprobs=10)
Then out["choices"][0]["logprobs"]["top_logprobs"][0]
looks like this
{' Paris': np.float32(-0.531455),
' not': np.float32(-2.7322779),
' located': np.float32(-3.029975),
' the': np.float32(-3.4100742),
' a': np.float32(-3.6376095),
' also': np.float32(-4.1634436),
' actually': np.float32(-4.2124586),
'...': np.float32(-4.279561),
' in': np.float32(-4.5441475),
' officially': np.float32(-4.6838427)}
You can convert logprobs into probability with np.exp().