pythonpython-requests.netrc

Python requests module slow due to high number of internal function calls


On one of my machines the Python module requests is extremely slow, seemingly due to a very high number of internal function calls. What could lead to such an issue?

>>> import sys, cProfile, requests, urllib3
>>> sys.version
'3.10.6 (main, Mar 10 2023, 10:55:28) [GCC 11.3.0]'
>>> requests.__version__
'2.28.1' # same issue with latest 2.29.0 release

>>> cProfile.run('requests.get("http://127.0.0.1:8080")', sort='ncalls')
        11717750 function calls (11717748 primitive calls) in 10.091 seconds

    Ordered by: call count

    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   5220221    0.770   0.000     0.775    0.000 {method 'read' of '_io.TextIOWrapper' objects}
   5220221    0.801   0.000     0.801    0.000 shlex.py:68(punctuation_chars)
    468483    0.364   0.000     9.564    0.000 shlex.py:101(get_token)
    401557    7.612   0.000     9.188    0.000 shlex.py:133(read_token)
    200797    0.059   0.000     0.059    0.000 {method 'startswith' of 'str' objects}
     66931    0.012   0.000     0.012    0.000 {method 'popleft' of 'collections.deque' objects}
     66926    0.012   0.000     0.012    0.000 {method 'appendleft' of 'collections.deque' objects}
     66926    0.049   0.000     0.061    0.000 shlex.py:72(push_token)
      [..] # remaining calls with significantly lower numbers

>>> cProfile.run('http=urllib3.PoolManager(); http.request("GET", "http://127.0.0.1:8080")', sort='ncalls')
        784 function calls (783 primitive calls) in 0.005 seconds

On another machines, running the same version of Python and tested with same versions of requests, things look about right:

>>> cProfile.run('requests.get("http://127.0.0.1:8080")', sort='ncalls')
        2432 function calls (2396 primitive calls) in 0.005 seconds

Solution

  • Using the Python profile viewer tuna I found that requests will attempt to get authentication credentials from a netrc file ($HOME/.netrc or $HOME/_netrc) if no other authentication method is specified:

    Instead of just looking at number of calls (ncalls), sorting by cumulative time (sort=cumtime) would also have helped highlight how the issuing of netrc.py and subsequent shlex.py were the main culprits.

    In my case I had a fairly large netrc file at thousands of entries. Upon trimming or removing the netrc file entirely the subsequent number of function calls shrunk accordingly.


    Ideally the performance of the netrc Python module wouldn't be such a bottleneck, as the file itself isn't all that complex, and other programs like curl has no issues handling it, but oh well.