Hello Stack Overflow community,
I've been working on integrating ChatGPT's API into my project, and I'm having some trouble calculating the total number of tokens for my API requests. Specifically, I'm passing both messages and functions in my API calls.
I've managed to figure out how to calculate the token count for the messages, but I'm unsure about how to account for the tokens used by the functions.
Could someone please guide me on how to properly calculate the total token count, including both messages and functions, for a request to ChatGPT's API?
Any help or insights would be greatly appreciated!
Thank you in advance.
I have been working on brute forcing a solution by formatting the data in the call in different ways. I have been using the tokenizer, and Tiktokenizer to test my formats.
I am going to walk you through calculating the tokens for gpt-3.5 and gpt-4. You can apply a similar method to other models you just need to find the right settings.
We are going to calculate the tokens used by the messages and the functions separately then adding them together at the end to get the total.
Start by getting the tokenizer using tiktoken. We will use this to tokenize all the custom text in the messages and functions. Also add constants for the extra tokens the API will add to the request.
enc = tiktoken.encoding_for_model(model)
Make a variable to hold the total tokens for the messages and set it to 0.
msg_token_count = 0
Loop through the messages, and for each message add 3 to msg_token_count
. Then loop through each element in the message and encode the value, adding the length of the encoded object to msg_token_count
. If the dictionary has the "name" key set add an additional token to msg_token_count
.
for message in messages:
msg_token_count += 3 # Add tokens for each message
for key, value in message.items():
msg_token_count += len(enc.encode(value)) # Add tokens in set message
if key == "name":
msgTokenCount += 1 # Add token if name is set
Finally we need to add 3 to msg_token_count
, for the ending tokens.
msgTokenCount += 3 # Add tokens to account for ending
Now we are going to calculate the number of tokens the functions will take.
Start by making a variable to hold the total tokens used by functions and set it to 0.
func_token_count = 0
Next we are going to loop through the functions and add tokens to func_token_count
. Loop through the functions and add 7 to func_token_count
for each function. Then add the length of the encoded name and description.
For each function, if it has properties, add 3 to func_token_count
. Then for each key in the properties add another 3 and the length of the encoded property, making sure to subtract 3 if it has an "enum" key, and adding 3 for each item in the enum section.
Finally, add 12 to func_token_count
to account for the tokens at the end of all the functions.
for function in functions:
func_token_count += 7 # Add tokens for start of each function
f_name = function["name"]
f_desc = function["description"]
if f_desc.endswith("."):
f_desc = f_desc[:-1]
line = f_name + ":" + f_desc
func_token_count += len(enc.encode(line)) # Add tokens for set name and description
if len(function["parameters"]["properties"]) > 0:
func_token_count += 3 # Add tokens for start of each property
for key in list(function["parameters"]["properties"].keys()):
func_token_count += 3 # Add tokens for each set property
p_name = key
p_type = function["parameters"]["properties"][key]["type"]
p_desc = function["parameters"]["properties"][key]["description"]
if "enum" in function["parameters"]["properties"][key].keys():
func_token_count += 3 # Add tokens if property has enum list
for item in function["parameters"]["properties"][key]["enum"]:
func_token_count += 3
func_token_count += len(enc.encode(item))
if p_desc.endswith("."):
p_desc = p_desc[:-1]
line = f"{p_name}:{p_type}:{p_desc}"
func_token_count += len(enc.encode(line))
func_token_count += 12
Here is the full code. Please note that instead of hard coding the additional token counts I used a constant to hold the value.
def get_token_count(model, messages, functions):
# Initialize message settings to 0
msg_init = 0
msg_name = 0
msg_end = 0
# Initialize function settings to 0
func_init = 0
prop_init = 0
prop_key = 0
enum_init = 0
enum_item = 0
func_end = 0
if model in [
"gpt-3.5-turbo-0613",
"gpt-4-0613"
]:
# Set message settings for above models
msg_init = 3
msg_name = 1
msg_end = 3
# Set function settings for the above models
func_init = 7
prop_init = 3
prop_key = 3
enum_init = -3
enum_item = 3
func_end = 12
enc = tiktoken.encoding_for_model(model)
msg_token_count = 0
for message in messages:
msg_token_count += msg_init # Add tokens for each message
for key, value in message.items():
msg_token_count += len(enc.encode(value)) # Add tokens in set message
if key == "name":
msg_token_count += msg_name # Add tokens if name is set
msg_token_count += msg_end # Add tokens to account for ending
func_token_count = 0
if len(functions) > 0:
for function in functions:
func_token_count += func_init # Add tokens for start of each function
f_name = function["name"]
f_desc = function["description"]
if f_desc.endswith("."):
f_desc = f_desc[:-1]
line = f_name + ":" + f_desc
func_token_count += len(enc.encode(line)) # Add tokens for set name and description
if len(function["parameters"]["properties"]) > 0:
func_token_count += prop_init # Add tokens for start of each property
for key in list(function["parameters"]["properties"].keys()):
func_token_count += prop_key # Add tokens for each set property
p_name = key
p_type = function["parameters"]["properties"][key]["type"]
p_desc = function["parameters"]["properties"][key]["description"]
if "enum" in function["parameters"]["properties"][key].keys():
func_token_count += enum_init # Add tokens if property has enum list
for item in function["parameters"]["properties"][key]["enum"]:
func_token_count += enum_item
func_token_count += len(enc.encode(item))
if p_desc.endswith("."):
p_desc = p_desc[:-1]
line = f"{p_name}:{p_type}:{p_desc}"
func_token_count += len(enc.encode(line))
func_token_count += func_end
return msg_token_count + func_token_count
Please let me know if something is not clear, or if you have a suggestion to make my post better.