When requesting a web resource or website or web service with the requests library, the request takes a long time to complete. The code looks similar to the following:
import requests
requests.get("https://www.example.com/")
This request takes over 2 minutes (exactly 2 minutes 10 seconds) to complete! Why is it so slow and how can I fix it?
There can be multiple possible solutions to this problem. There are a multitude of answers on StackOverflow for any of these, so I will try to combine them all to save you the hassle of searching for them.
In my search I have uncovered the following layers to this:
For many problems, activating logging can help you uncover what goes wrong (source):
import requests
import logging
import http.client
http.client.HTTPConnection.debuglevel = 1
# You must initialize logging, otherwise you'll not see debug output.
logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True
requests.get("https://www.example.com")
In case the debug output does not help you solve the problem, read on.
It can be faster to not request all data, but to only send a HEAD request (source):
requests.head("https://www.example.com")
Some servers don't support this, then you can try to stream the response (source):
requests.get("https://www.example.com", stream=True)
If you send multiple requests in a row, you can speed up the requests by utilizing a requests.Session
. This makes sure the connection to the server stays open and configured and also persists cookies as a nice benefit. Try this (source):
import requests
session = requests.Session()
for _ in range(10):
session.get("https://www.example.com")
If you send a very large number of requests at once, each request blocks execution. You can parallelize this utilizing, e.g., requests-futures (idea from kederrac):
from concurrent.futures import as_completed
from requests_futures.sessions import FuturesSession
with FuturesSession() as session:
futures = [session.get("https://www.example.com") for _ in range(10)]
for future in as_completed(futures):
response = future.result()
Be careful not to overwhelm the server with too many requests at the same time.
If this also does not solve your problem, read on...
In many cases, the reason might lie with the server you are requesting from. First, verify this by requesting any other URL in the same fashion:
requests.get("https://www.google.com")
If this works fine, you can focus your efforts on the following possible problems:
The server might specifically block requests
, or they might utilize a whitelist, or some other reason. To send a nicer user-agent string, try this (source):
headers = {"User-Agent": "Mozilla/5.0 (X11; CrOS x86_64 12871.102.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.141 Safari/537.36"}
requests.get("https://www.example.com", headers=headers)
If this problem only occurs sometimes, e.g. after a few requests, the server might be rate-limiting you. Check the response to see if it reads something along those lines (i.e. "rate limit reached", "work queue depth exceeded" or similar; source).
Here, the solution is just to wait longer between requests, for example by using time.sleep()
.
You can check this by not reading the response you receive from the server. If the code is still slow, this is not your problem, but if this fixed it, the problem might lie with parsing the response.
To fix those, try:
r = requests.get("https://www.example.com")
r.raw.chunked = True # Fix issue 1
r.encoding = 'utf-8' # Fix issue 2
print(response.text)
This might be the worst problem of all to find. An easy, albeit weird, way to check this, is to add a timeout
parameter as follows:
requests.get("https://www.example.com/", timeout=5)
If this returns a successful response, the problem should lie with IPv6. The reason is that requests
first tries an IPv6 connection. When that times out, it tries to connect via IPv4. By setting the timeout low, you force it to switch to IPv4 within a shorter amount of time.
Verify by utilizing, e.g., wget
or curl
:
wget --inet6-only https://www.example.com -O - > /dev/null
# or
curl --ipv6 -v https://www.example.com
In both cases, we force the tool to connect via IPv6 to isolate the issue. If this times out, try again forcing IPv4:
wget --inet4-only https://www.example.com -O - > /dev/null
# or
curl --ipv4 -v https://www.example.com
If this works fine, you have found your problem! But how to solve it, you ask?
socket.AF_INET
for IPv4.)AddressFamily inet
to your SSH config.)