I'm learning to use Google AI Studio and when generating the snippet I came across these terms:
const generationConfig = {
temperature: 1,
topP: 0.95,
topK: 64,
maxOutputTokens: 8192,
responseMimeType: "text/plain",
};
I'm struggling to understand what those terms mean. What are topP
, topK
, and maxOutputTokens
. I want to understand these in order to use them properly.
You can find those details at the model parameters documentation.
But in a short:
max output tokens
limits the response max length. You literally limit how short (or long) you want your answer in tokens. Roughly speaking, just as a reference, 100 tokens is around 60-80 words.Gemini is a generative model which means that, in a high level explanation, it "composes" (or generates) an answer given its semantic knowledge in a given language (being a spoken language, a programming language, etc). So basically you can imagine a bag of possible "next tokens" when writing a sentence and top-k and top-p will customize the possible vocabulary to be considered.
with top-k
basically you limit the possible tokens universe. If the next tokens can be 200 possible different ones, you limit in the top first k ones. So top-k = 30
means that the model you consider the first 30 tokens in the possible list. but the next tokens is not picked yet at this step.
with top-p
you will work on a limit based on the cumulative probability. meaning: each token will have a probability related to how often the model saw the previous token followed by this token. So if you define top-p = 20
each means that from the 30 token you limited with top-k
, it will generate a new list with the tokens that sum a max probability of 20%. ie. if the first token has a 10% probability, the second has 5%, the third has 4% and the fifth has 2% - the list after the top-p analysis will contain the first, the second and the third (10% + 5% + 4%). Yet the next token is not picked in the step too.
finally comes the temperature
parameter which defines how deterministic the next token will be picked. A temperature equals to 0 drives the more deterministic choice where the token with higher priority will be chosen; temperature at maximum will be the more random choice of next token, which means that even the less probable token may be chosen too.
hope that helps.