pythonhttpsessiontcphttpx

Python & HTTPX: How does httpx client's connection pooling work?


Consider this function that makes a simple GET request to an API endpoint:

import httpx 

def check_status_without_session(url : str) -> int:
    response = httpx.get(url)
    return response.status_code

Running this function will open a new TCP connection every time the function check_status_without_session is called. Now, this section of HTTPX documentation recommends using the Client API while making multiple requests to the same URL. The following function does that:

import httpx

def check_status_with_session(url: str) -> int:
    with httpx.Client() as client:
        response = client.get(url)
        return response.status_code

According to the docs using Client will ensure that:

... a Client instance uses HTTP connection pooling. This means that when you make several requests to the same host, the Client will reuse the underlying TCP connection, instead of recreating one for every single request.

My question is, in the second case, I have wrapped the Client context manager in a function. If I call check_status_with_session multiple times with the same URL, wouldn't that just create a new pool of connections each time the function is called? This implies it's not actually reusing the connections. As the function stack gets destroyed after the execution of the function, the Client object should be destroyed as well, right? Is there any advantage in doing it like this or is there a better way?


Solution

  • Is there any advantage in doing it like this or is there a better way?

    No, there is no advantage using httpx.Client in the way you've shown. In fact the httpx.<method> API, e.g. httpx.get, does exactly the same thing!

    The "pool" is a feature of the transport manager held by Client, which is HTTPTransport by default. The transport is created at Client initialization time and stored as the instance property self._transport.

    Creating a new Client instance means a new HTTPTransport instance, and transport instances have their own TCP connection pool. By creating a new Client instance each time and using it only once, you get no benefit over using e.g. httpx.get directly.

    And that might be OK! Connection pooling is an optimization over creating a new TCP connection for each request. Your application may not need that optimization, it may be performant enough already for your needs.

    If you are making many requests to the same endpoint in a tight loop, iterating within the context of the loop may give you some throughput gains, e.g.

    with httpx.Client(base_url="https://example.com") as client:
        results = [client.get(f"/api/resource/{idx}") for idx in range(100)]
    

    For such I/O-heavy workloads you may do even better by executing results in parallel, e.g. using httpx.AsyncClient.