pythonprofilecprofilepstats

Why is pstats.print_callers() ignores the restriction argument?


I'm trying to profile my python script using cProfile and displaying the results with pstats. In particular, I'm trying to use the pstats function p.sort_stats('time').print_callers(20) to print only the top 20 functions by time, as decribed in the documentation.

I expect to get only the top 20 results (functions profiled and their calling functions ordered by time), instead, I get a seemingly unfiltered list of over 1000 functions that completely saturates my terminal (hence I'm estimating over a 1000 functions).

Why is my restriction argument (i.e. 20) being ignored by print_callers() and how can I fix this?

I've tried looking up an answer and couldn't find one. And I tried to create a minimal reproducible example, but when I do, I can't reproduce the problem (i.e. it works fine).

my profiling code is:

import cProfile
import pstats

if __name__ == '__main__':
    cProfile.run('main()', 'mystats')
    p = pstats.Stats('mystats')
    p.sort_stats('time').print_callers(20)

I'm trying to avoid having to post my full code, so if someone else has encountered this issue before, and can answer without seeing my full code, that would be great.

Thank you very much in advance.

Edit 1: Partial output:

  Ordered by: internal time
   List reduced from 1430 to 1 due to restriction <1>

Function                               was called by...
                                           ncalls  tottime  cumtime
{built-in method builtins.isinstance}  <-    2237    0.000    0.000  <frozen importlib._bootstrap>:997(_handle_fromlist)
                                                9    0.000    0.000  <frozen importlib._bootstrap_external>:485(_compile_bytecode)
                                               44    0.000    0.000  <frozen importlib._bootstrap_external>:1117(_get_spec)
                                             4872    0.001    0.001  C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\_strptime.py:321(_strptime)
                                                5    0.000    0.000  C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\abc.py:196(__subclasscheck__)
                                               26    0.000    0.000  C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\calendar.py:58(__getitem__)
                                               14    0.000    0.000  C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\calendar.py:77(__getitem__)
                                                2    0.000    0.000  C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\distutils\version.py:331(_cmp)
                                               20    0.000    0.000  C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\enum.py:797(__or__)
                                              362    0.000    0.000  C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\enum.py:803(__and__)
                                                1    0.000    0.000  C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\inspect.py:73(isclass)
                                               30    0.000    0.000  C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\json\encoder.py:182(encode)
                                                2    0.000    0.000  C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\ntpath.py:34(_get_bothseps)
                                                1    0.000    0.000  C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\ntpath.py:75(join)
                                                4    0.000    0.000  C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\ntpath.py:122(splitdrive)
                                                3    0.000    0.000  C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\ntpath.py:309(expanduser)
                                                4    0.000    0.000  C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\os.py:728(check_str)
                                               44    0.000    0.000  C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\re.py:249(escape)
                                                4    0.000    0.000  C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\re.py:286(_compile)
                                              609    0.000    0.000  C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\site-packages\dateutil\parser\_parser.py:62(__init__)
                                             1222    0.000    0.000  C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\core\_methods.py:48(_count_reduce_items)
                                             1222    0.000    0.000  C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\core\_methods.py:58(_mean)
                                                1    0.000    0.000  C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\core\arrayprint.py:834(__init__)
                                             1393    0.000    0.000  C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\core\fromnumeric.py:1583(ravel)
                                             1239    0.000    0.000  C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\core\fromnumeric.py:1966(sum)
...

Solution

  • I figured out the issue. As usual, the Python Library does not have a bug, rather, I misunderstood the output of the function call.

    I'm elaborating it here as an answer in case it helps anyone clear up this misunderstanding in the future.

    When I asked the question, I didn't understand why p.print_callers(20) prints out to terminal over a thousand lines, even though I am restricting it to the top 20 function calls (by time).

    What is actually happening is that my restriction of printing the top 20 "most time consuming functions", restricts the list to the top 20 functions, but then prints a list of all the functions that called each of the top twenty functions.

    Since each of the top 20 functions was called (on average) by about 100 different functions each one of the top functions, had about 100 lines associated with it. so 20*100=2000, and so p.print_callers(20) printed well over a 1000 lines and saturated my terminal.

    I hope this saves someone some time and debugging headache :)