openai-apichatgpt-apigpt-3gpt-4

Properly count amount of tokens in the whole request payload - OpenAI


17.10.24: Title edited for easier search
Original title: What part of OpenAI API request payload is limited by the max amount tokens?


I kinda understand how to count tokens out of characters, but what do I actually have to count? If I have a payload like this:

{
  "model": "gpt-3.5-turbo",
  "temperature": 1,
  "max_tokens": 400,
  "presence_penalty": 0.85,
  "frequency_penalty": 0.85,
  "messages": [
    {
      "role": "system",
      "content": "prompt"
    },
    {
      "role": "assistant",
      "content": "message"
    },
    // tens of messages
  ]
}

Do I have to count tokens out of it entirely? Or do I have to count it in "messages" only? If so, do I have to count all the json syntax characters, like spacebars, brackets and commas too? What about "role" and "content" keys? What about "role" value?
Or I have to simply concat all the "content" values into a single string and count tokens based only on it?


Solution

  • From my understanding and calculations, all the tokens in the list provided in "messages" are counted. This includes the keys "role" and "content" and their values but does not include spaces, brackets, commas, and quotes.

    I use the following script provided by OpenAI to calculate the number of tokens in my input. I have modified the script to calculate the cost involved with the input for multiple messages (not the output response) and it's been fairly accurate for me.

    import json
    import os
    import tiktoken
    import numpy as np
    from collections import defaultdict
    
    def num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613"):
        """Return the number of tokens used by a list of messages."""
        try:
            encoding = tiktoken.encoding_for_model(model)
        except KeyError:
            print("Warning: model not found. Using cl100k_base encoding.")
            encoding = tiktoken.get_encoding("cl100k_base")
        if model in {
            "gpt-3.5-turbo-0613",
            "gpt-3.5-turbo-16k-0613",
            "gpt-4-0613",
            "gpt-4-32k-0613",
            }:
            tokens_per_message = 3
            tokens_per_name = 1
        elif model == "gpt-3.5-turbo-0301":
            tokens_per_message = 4  # every message follows <|start|>{role/name}\n{content}<|end|>\n
            tokens_per_name = -1  # if there's a name, the role is omitted
        elif "gpt-3.5-turbo" in model:
            print("Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613.")
            return num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613")
        elif "gpt-4" in model:
            print("Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.")
            return num_tokens_from_messages(messages, model="gpt-4-0613")
        else:
            raise NotImplementedError(
                f"""num_tokens_from_messages() is not implemented for model {model}. See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens."""
            )
        num_tokens = 0
        for message in messages:
            num_tokens += tokens_per_message
            for key, value in message.items():
                num_tokens += len(encoding.encode(value))
                if key == "name":
                    num_tokens += tokens_per_name
        num_tokens += 3  # every reply is primed with <|start|>assistant<|message|>
        return num_tokens
    
    convo_lens = []
    
    for ex in dataset: #Your list of inputs
        messages = ex["messages"]
        convo_lens.append(num_tokens_from_messages(messages))
    
    n_input_tokens_in_dataset = sum(min(4096, length) for length in convo_lens)
    print(f"Input portion of the data has ~{n_input_tokens_in_dataset} tokens")
    
    # costs as of Aug 29 2023.
    costs = {
        "gpt-4-0613": {
            "input" : 0.03,
            "output": 0.06
        },
        "gpt-4-32k-0613": {
            "input" : 0.06,
            "output": 0.12
        },
        "gpt-3.5-turbo-0613": {
            "input": 0.0015,
            "output": 0.002
        },
    
        "gpt-3.5-turbo-16k-0613": {
            "input": 0.003,
            "output": 0.004
        }
    }
    
    # We select GPT 3.5 turbo here
    print(f"Cost of inference: ${(n_input_tokens_in_dataset/1000) * costs['gpt-3.5-turbo-0613']['input']}")