azureazure-cognitive-servicescost-managementazure-openai

How can I keep track of the expenses of each program separately when calling the same OpenAI GPT model deployed in an Azure instance?


I have an OpenAI GPT model deployed in an instance belonging to a resource in my Azure subscription. I have two programs that use this OpenAI GPT model. How can I keep track of the expenses of each program separately?


Example: I deployed the OpenAI GPT model "GPT 4 32k" as gpt-4-32k-viet. Program A and program B use this model. How can I keep track of the expenses of incurred by program A and program B separately?

enter image description here

I use the code from the Azure OpenAI tutorial:

import tiktoken
import openai
import os
openai.api_type = "azure"
openai.api_version = "2023-03-15-preview"
openai.api_base = "https://[resourcename].openai.azure.com/" # Your Azure OpenAI resource's endpoint value .
openai.api_key = "[my instance key]"


system_message = {"role": "system", "content": "You are a helpful assistant."}
max_response_tokens = 250
token_limit= 4096
conversation=[]
conversation.append(system_message)


def num_tokens_from_messages(messages, model="gpt-4-32k"):
    encoding = tiktoken.encoding_for_model(model)
    num_tokens = 0
    for message in messages:
        num_tokens += 4  # every message follows <im_start>{role/name}\n{content}<im_end>\n
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))
            if key == "name":  # if there's a name, the role is omitted
                num_tokens += -1  # role is always required and always 1 token
    num_tokens += 2  # every reply is primed with <im_start>assistant
    return num_tokens


user_input = 'Hi there. What is the difference between Facebook and TikTok?'
conversation.append({"role": "user", "content": user_input})
conv_history_tokens = num_tokens_from_messages(conversation)

while (conv_history_tokens + max_response_tokens >= token_limit):
    del conversation[1]
    conv_history_tokens = num_tokens_from_messages(conversation)

response = openai.ChatCompletion.create(
    engine="gpt-4-32k-viet",  # The deployment name you chose when you deployed the ChatGPT or GPT-4 model.
    messages=conversation,
    temperature=.7,
    max_tokens=max_response_tokens,
)

conversation.append({"role": "assistant", "content": response['choices'][0]['message']['content']})
print("\n" + response['choices'][0]['message']['content'] + "\n")

Solution

  • You have to enclose them in different Resource Groups.

    You can then target that Resource Group and group by Service Name if you want to have a more granular view:

    enter image description here

    We are having an hard time trying to understand the real costs of GPT and the only way I suggest you is t test like crazy.

    If you feed that language model with 500 characters you have a cost.

    But if you feed it with 5,000 characters don't expect to have the same cost x10.

    Is difficult to forecast so what I suggest you is containerize per Resource Group. This technology is not designed to be multi-tenant, you will lose the costs. If you want to know how much your customer have consumed the only way is to go single-tenant.

    Otherwise you have to create an ID per customer and link each token to that ID. And good luck with that.