python-3.xcpythonpython-internals

Finding the source code of methods implemented in C?


Please note that I am asking this question solely for informational purposes.

I know the title sound like a duplicate of Finding the source code for built-in Python functions?. But let me explain.

Say for example, I want to find the source code of most_common method of collections.Counter class. Since the Counter class is implemented in python I could use the inspect module get it's source code.

>>> import inspect
>>> import collections
>>> print(inspect.getsource(collections.Counter.most_common))

This will print

    def most_common(self, n=None):
        '''List the n most common elements and their counts from the most
        common to the least.  If n is None, then list all element counts.

        >>> Counter('abcdeabcdabcaba').most_common(3)
        [('a', 5), ('b', 4), ('c', 3)]

        '''
        # Emulate Bag.sortedByCount from Smalltalk
        if n is None:
            return sorted(self.items(), key=_itemgetter(1), reverse=True)
        return _heapq.nlargest(n, self.items(), key=_itemgetter(1))

So if the method or class is implemented in C inspect.getsource will raise TypeError.

>>> my_list = []
>>> print(inspect.getsource(my_list.append))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\username\AppData\Local\Programs\Python\Python36-32\lib\inspect.py", line 968, in getsource
    lines, lnum = getsourcelines(object)
  File "C:\Users\username\AppData\Local\Programs\Python\Python36-32\lib\inspect.py", line 955, in getsourcelines
    lines, lnum = findsource(object)
  File "C:\Users\username\AppData\Local\Programs\Python\Python36-32\lib\inspect.py", line 768, in findsource
    file = getsourcefile(object)
  File "C:\Users\username\AppData\Local\Programs\Python\Python36-32\lib\inspect.py", line 684, in getsourcefile
    filename = getfile(object)
  File "C:\Users\username\AppData\Local\Programs\Python\Python36-32\lib\inspect.py", line 666, in getfile
    'function, traceback, frame, or code object'.format(object))
TypeError: <built-in method append of list object at 0x00D3A378> is not a module, class, method, function, traceback, frame, or code object.

So my question is, Is there is any way(or Using third party package?) that we can find the source code of class or method implemented in C as well?

ie, something like this

>> print(some_how_or_some_custom_package([].append))
int
PyList_Append(PyObject *op, PyObject *newitem)
{
    if (PyList_Check(op) && (newitem != NULL))
        return app1((PyListObject *)op, newitem);
    PyErr_BadInternalCall();
    return -1;
}

Solution

  • No, there is not. There is no metadata accessible from Python that will let you find the original source file. Such metadata would have to be created explicitly by the Python developers, without a clear benefit as to what that would achieve.

    First and foremost, the vast majority of Python installations do not include the C source code. Next, while you could conceivably expect users of the Python language to be able to read Python source code, Python's userbase is very broad and a large number do not know C or are interested in how the C code works, and finally, even developers that know C can't be expected to have to read the Python C API documentation, something that quickly becomes a requirement if you want to understand the Python codebase.

    C files do not directly map to a specific output file, unlike Python bytecode cache files and scripts. Unless you create a debug build with a symbol table, the compiler doesn't retain the source filename in the generated object file (.o) it outputs, nor will the linker record what .o files went into the result it produces. Nor do all C files end up contributing to the same executable or dynamic shared object file; some become part of the Python binary, others become loadable extensions, and the mix is configurable and dependent on what external libraries are available at the time of compilation.

    And between makefiles, setup.py and C pre-propressor macros, the combination of input files and what lines of source code are actually used to create each of the output files also varies. Last but not least, because the C source files are no longer consulted at runtime, they can't be expected to still be available in the same original location, so even if there was some metadata stored you still couldn't map that back to the original.

    So, it's just easier to just remember a few base rules about how the Python C-API works, then map that back to the C code with a few informed code searches.

    Alternatively, download the Python source code and create a debug build, and use a good IDE to help you map symbols and such back to source files. Different compilers, platforms and IDEs have different methods of supporting symbol tables for debugging.