pythonopenai-apichatgpt-apigpt-4

GPT python SDK introduces massive overhead / incorrect timeout


I've been using openai python packge v0.28.1 with the requests_timeout param which worked OK. I then updated to the ^1. version only to find out that the timeout no longer works as expected (they have changed the param name from requests_timeout to timeout.

Here is an odd behavior with the current newest version (1.14.1):

from openai import OpenAI, APITimeoutError
import os

client = OpenAI(
    api_key=os.environ['OPENAI_API_KEY'],
)

for timeout in [0.001, 0.1, 1, 2]:
    with log_duration('openai query') as duration_context:
        try:
            response = client.chat.completions.create(  # type: ignore[call-overload]
                model="gpt-4-0125-preview",
                messages=[{'content': 'describe the universe in 10000 characters', 'role': 'system'}],
                temperature=0.0,
                max_tokens=450,
                top_p=1,
                timeout=timeout
            )
        except APITimeoutError as e:
            continue

log_duration just measure the time it takes. the result are :

2024-03-20 14:59:19 [info     ] openai query duration=2.805093 duration=2.8050930500030518 name=openai query
2024-03-20 14:59:22 [info     ] openai query duration=2.844164 duration=2.8441641330718994 name=openai query
2024-03-20 14:59:29 [info     ] openai query duration=6.396946 duration=6.396945953369141 name=openai query
2024-03-20 14:59:38 [info     ] openai query duration=9.387082 duration=9.387081861495972 name=openai query

which is way more then the timeouts. We have been getting a bunch of timeouts on our lambdas without understanding why as the timeout on openai is supposed to be so much lower.

what am I missing? is there such a big overhead in OpenAI's >1 python SDK?


Solution

  • Requests that time out are retried twice by default, with a short exponential backoff.

    You can use the max_retries option to configure or disable retry settings:

    from openai import OpenAI
    
    # Configure the default for all requests:
    client = OpenAI(
        # default is 2
        max_retries=0,
    )
    
    # Or, configure per-request:
    client.with_options(max_retries=5).chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": "How can I get the name of the current day in Node.js?",
            }
        ],
        model="gpt-3.5-turbo",
    )