pythoncprofile

find caller of python builtins


This is similar to Profiling in Python: Who called the function? but with a twist. I have a python function that I am profiling with cProfile. On my first run, stats showed it spending most of its time in get_loc() from datetimes.py;

Fine, I found the datetimes.py file on my system and temporarily added import pdb and pdb.set_trace() at the top of get_loc() and then randomly used debugger command where several times with a more-or-less random number of continue's between wheres. This showed me that, per my random sample, pretty much all of the couple thousand calls to get_loc() where coming ultimately from the same place in my own code. I modified that code to significantly reduce the number of calls. Great!

Next up, I now see a large amount of time being spent in over 66,000 calls to builtins.isinstance(); The problem is I can't find the source code to implement a stack trace to find out what's calling isinstance(). (I am guessing, based on my limited knowledge of Python, that isinstance is actually C code that is either directly, or via shared object, linked into the Python interpreter). At any rate, I can't find what is calling is isinstance. I tried the Gprof2dot method also found among the answers of Profiling in Python: Who called the function? but that only shows me that isinstance() is being called by is_dtype() and the call graph stops there (at is_dtype).

Any ideas?


Solution

  • I've done some digging, and will attempt, at least partially, to answer my question (in the hopes that others will see and contribute additional ideas on how to make this easier).

    So here's how I profile my function

    pr = cProfile.Profile()                # create a cProfiler object 
    pr.enable()                            # turn profiling on
    pr.runcall( myfunc, arg1, arg2, ... )  # profile my function
    pr.disable()                           # turn profiling off
    

    Then, here is the key to finding callers:

    import pstats
    p = pstats.Stats(pr)           # create pstats obj based on profiler above.
    p.print_callers('isinstance')  # find all the callers of isinstance.
    

    The print_callers function prints out 130 callers of isinstance, along with how many times each caller calls isinstance. That's a lot to digest, but for profiling purposes it makes sense to focus on the handful that call the most, and find their callers. Here is part of the list (which print_callers presents in more-or-less random order, but I have sorted them here based on the number of calls) ...

    Function {built-in method builtins.isinstance} was called by...
                                               ncalls  tottime  cumtime
                                                 2808    0.002    0.002  /anaconda3/lib/python3.7/site-packages/pandas/core/indexes/datetimes.py:974(get_loc)
                                                 2158    0.006    0.018 /anaconda3/lib/python3.7/site-packages/pandas/core/dtypes/base.py:75(is_dtype)
                                                 2030    0.001    0.001  /anaconda3/lib/python3.7/site-packages/pandas/core/dtypes/common.py:1845(_is_dtype_type)
                                                 1930    0.001    0.001  /anaconda3/lib/python3.7/site-packages/pandas/core/dtypes/common.py:1981(pandas_dtype)
                                                 1440    0.001    0.001  /anaconda3/lib/python3.7/site-packages/pandas/core/dtypes/dtypes.py:68(find)
                                                 1058    0.001    0.001  /anaconda3/lib/python3.7/site-packages/pandas/core/indexes/datetimes.py:924(get_value)
                                                  841    0.000    0.000  /anaconda3/lib/python3.7/site-packages/pandas/core/dtypes/common.py:1809(_get_dtype)
                                                  726    0.000    0.000  /anaconda3/lib/python3.7/site-packages/pandas/core/dtypes/common.py:1702(is_extension_arr
    ...
                                                    6    0.000    0.000  /anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py:153(_get_setitem_indexer)
                                                    5    0.000    0.000  /anaconda3/lib/python3.7/site-packages/pandas/core/ops.py:1660(wrapper)
                                                    5    0.000    0.000  /anaconda3/lib/python3.7/site-packages/pandas/core/ops.py:1447(_align_method_SERIES)
                                                    5    0.000    0.000 /anaconda3/lib/python3.7/site-packages/pandas/core/computation/expressions.py:163(_has_bool_dtype)
                                                    4    0.000    0.000 /anaconda3/lib/python3.7/site-packages/pandas/core/internals/blocks.py:3100(_extend_blocks)
                                                    4    0.000    0.000  /anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py:2501(check_setitem_lengths)
                                                    3    0.000    0.000  /anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py:566(_shallow_copy)
                                                    3    0.000    0.000  /anaconda3/lib/python3.7/site-packages/pandas/core/dtypes/common.py:702(is_datetimelike)
                                                    3    0.000    0.000  /anaconda3/lib/python3.7/posixpath.py:41(_get_sep)
                                                    2    0.000    0.000  /anaconda3/lib/python3.7/distutils/version.py:331(_cmp)
    

    At this point, I suppose I could tediously go through the biggest callers (find their biggest callers, and so on) or write a script to do it. Will continue digging and post later additional progress.