On one of my machines the Python module requests
is extremely slow, seemingly due to a very high number of internal function calls. What could lead to such an issue?
>>> import sys, cProfile, requests, urllib3
>>> sys.version
'3.10.6 (main, Mar 10 2023, 10:55:28) [GCC 11.3.0]'
>>> requests.__version__
'2.28.1' # same issue with latest 2.29.0 release
>>> cProfile.run('requests.get("http://127.0.0.1:8080")', sort='ncalls')
11717750 function calls (11717748 primitive calls) in 10.091 seconds
Ordered by: call count
ncalls tottime percall cumtime percall filename:lineno(function)
5220221 0.770 0.000 0.775 0.000 {method 'read' of '_io.TextIOWrapper' objects}
5220221 0.801 0.000 0.801 0.000 shlex.py:68(punctuation_chars)
468483 0.364 0.000 9.564 0.000 shlex.py:101(get_token)
401557 7.612 0.000 9.188 0.000 shlex.py:133(read_token)
200797 0.059 0.000 0.059 0.000 {method 'startswith' of 'str' objects}
66931 0.012 0.000 0.012 0.000 {method 'popleft' of 'collections.deque' objects}
66926 0.012 0.000 0.012 0.000 {method 'appendleft' of 'collections.deque' objects}
66926 0.049 0.000 0.061 0.000 shlex.py:72(push_token)
[..] # remaining calls with significantly lower numbers
>>> cProfile.run('http=urllib3.PoolManager(); http.request("GET", "http://127.0.0.1:8080")', sort='ncalls')
784 function calls (783 primitive calls) in 0.005 seconds
On another machines, running the same version of Python and tested with
same versions of requests
, things look about right:
>>> cProfile.run('requests.get("http://127.0.0.1:8080")', sort='ncalls')
2432 function calls (2396 primitive calls) in 0.005 seconds
python3 -m http.server 8080 --bind 127.0.0.1
,
but same results for other HTTP serverstcpdump
urllib3
, completely unrelated tools such as curl
works just finerequests
using requests.packages.urllib3.util.connection.HAS_IPV6 = False
did not have any effect eitherUsing the Python profile viewer tuna I found
that requests
will attempt to get authentication
credentials from a netrc file ($HOME/.netrc
or $HOME/_netrc
) if no other
authentication method is specified:
Instead of just looking at number of calls (ncalls
), sorting by cumulative
time (sort=cumtime
) would also have helped highlight how the issuing of
netrc.py
and subsequent shlex.py
were the main culprits.
In my case I had a fairly large netrc file at thousands of entries. Upon trimming or removing the netrc file entirely the subsequent number of function calls shrunk accordingly.
Ideally the performance of the netrc
Python module wouldn't be such a
bottleneck, as the file itself isn't all that complex, and other programs like
curl
has no issues handling it, but oh well.