17.10.24: Title edited for easier search
Original title: What part of OpenAI API request payload is limited by the max amount tokens?
I kinda understand how to count tokens out of characters, but what do I actually have to count? If I have a payload like this:
{
"model": "gpt-3.5-turbo",
"temperature": 1,
"max_tokens": 400,
"presence_penalty": 0.85,
"frequency_penalty": 0.85,
"messages": [
{
"role": "system",
"content": "prompt"
},
{
"role": "assistant",
"content": "message"
},
// tens of messages
]
}
Do I have to count tokens out of it entirely? Or do I have to count it in "messages"
only? If so, do I have to count all the json syntax characters, like spacebars, brackets and commas too? What about "role"
and "content"
keys? What about "role"
value?
Or I have to simply concat all the "content"
values into a single string and count tokens based only on it?
From my understanding and calculations, all the tokens in the list provided in "messages"
are counted. This includes the keys "role" and "content" and their values but does not include spaces, brackets, commas, and quotes.
I use the following script provided by OpenAI to calculate the number of tokens in my input. I have modified the script to calculate the cost involved with the input for multiple messages (not the output response) and it's been fairly accurate for me.
import json
import os
import tiktoken
import numpy as np
from collections import defaultdict
def num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613"):
"""Return the number of tokens used by a list of messages."""
try:
encoding = tiktoken.encoding_for_model(model)
except KeyError:
print("Warning: model not found. Using cl100k_base encoding.")
encoding = tiktoken.get_encoding("cl100k_base")
if model in {
"gpt-3.5-turbo-0613",
"gpt-3.5-turbo-16k-0613",
"gpt-4-0613",
"gpt-4-32k-0613",
}:
tokens_per_message = 3
tokens_per_name = 1
elif model == "gpt-3.5-turbo-0301":
tokens_per_message = 4 # every message follows <|start|>{role/name}\n{content}<|end|>\n
tokens_per_name = -1 # if there's a name, the role is omitted
elif "gpt-3.5-turbo" in model:
print("Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613.")
return num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613")
elif "gpt-4" in model:
print("Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.")
return num_tokens_from_messages(messages, model="gpt-4-0613")
else:
raise NotImplementedError(
f"""num_tokens_from_messages() is not implemented for model {model}. See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens."""
)
num_tokens = 0
for message in messages:
num_tokens += tokens_per_message
for key, value in message.items():
num_tokens += len(encoding.encode(value))
if key == "name":
num_tokens += tokens_per_name
num_tokens += 3 # every reply is primed with <|start|>assistant<|message|>
return num_tokens
convo_lens = []
for ex in dataset: #Your list of inputs
messages = ex["messages"]
convo_lens.append(num_tokens_from_messages(messages))
n_input_tokens_in_dataset = sum(min(4096, length) for length in convo_lens)
print(f"Input portion of the data has ~{n_input_tokens_in_dataset} tokens")
# costs as of Aug 29 2023.
costs = {
"gpt-4-0613": {
"input" : 0.03,
"output": 0.06
},
"gpt-4-32k-0613": {
"input" : 0.06,
"output": 0.12
},
"gpt-3.5-turbo-0613": {
"input": 0.0015,
"output": 0.002
},
"gpt-3.5-turbo-16k-0613": {
"input": 0.003,
"output": 0.004
}
}
# We select GPT 3.5 turbo here
print(f"Cost of inference: ${(n_input_tokens_in_dataset/1000) * costs['gpt-3.5-turbo-0613']['input']}")