[SOLVED] How to use `llama-cpp-python` to output list of candidate tokens and their probabilities?

How to use `llama-cpp-python` to output list of candidate tokens and their probabilities?

I want to manually choose my tokens by myself, instead of letting llama-cpp-python automatically choose one for me.

This requires me to see a list of candidate next tokens, along their probabilities, so that I pick the right one as per my criteria.

How to do this?

Solution

You need to create model with logits_all=True

model = Llama(model_path="your model here", logits_all=True)

Then request completion with one max token and the number of logprobs you need

out = model.create_completion("The capital of France is", max_tokens=1, logprobs=10)

Then out["choices"][0]["logprobs"]["top_logprobs"][0] looks like this

{' Paris': np.float32(-0.531455),
  ' not': np.float32(-2.7322779),
  ' located': np.float32(-3.029975),
  ' the': np.float32(-3.4100742),
  ' a': np.float32(-3.6376095),
  ' also': np.float32(-4.1634436),
  ' actually': np.float32(-4.2124586),
  '...': np.float32(-4.279561),
  ' in': np.float32(-4.5441475),
  ' officially': np.float32(-4.6838427)}

You can convert logprobs into probability with np.exp().