pythoncmallocpython-c-extension

Is there any reason to use malloc over PyMem_Malloc?


I'm reading the documentation for Memory Management in Python C extensions, and as far as I can tell, there doesn't really seem to be much reason to use malloc rather than PyMem_Malloc. Say I want to allocate an array that isn't to be exposed to Python source code and will be stored in an object that will be garbage collected. Is there any reason to use malloc?


Solution

  • EDIT: Mixed PyMem_Malloc and PyObject_Malloc corrections; they are two different calls.

    Without the PYMALLOC_DEBUG macro activated, PyMem_Malloc is an alias of libc's malloc(), having one special case: calling PyMem_Malloc to allocate zero bytes will return a non-NULL pointer, while malloc(zero_bytes) might return a NULL value or raise a system error (source code reference):

    /* malloc. Note that nbytes==0 tries to return a non-NULL pointer, distinct

    • from all other currently live pointers. This may not be possible. */

    Also, there is an advisory note on the pymem.h header file:

    Never mix calls to PyMem_ with calls to the platform malloc/realloc/ calloc/free. For example, on Windows different DLLs may end up using different heaps, and if you use PyMem_Malloc you'll get the memory from the heap used by the Python DLL; it could be a disaster if you free()'ed that directly in your own extension. Using PyMem_Free instead ensures Python can return the memory to the proper heap. As another example, in PYMALLOC_DEBUG mode, Python wraps all calls to all PyMem_ and PyObject_ memory functions in special debugging wrappers that add additional debugging info to dynamic memory blocks. The system routines have no idea what to do with that stuff, and the Python wrappers have no idea what to do with raw blocks obtained directly by the system routines then.

    Then, there are some Python specific tunings inside PyMem_Malloc PyObject_Malloc, a function used not only for C extensions but for all the dynamic allocations while running a Python program, like 100*234, str(100) or 10 + 4j:

    >>> id(10 + 4j)
    139721697591440
    >>> id(10 + 4j)
    139721697591504
    >>> id(10 + 4j)
    139721697591440
    

    The previous complex() instances are small objects allocated on a dedicated pool.

    Small objects (<256 bytes) allocation with PyMem_Malloc PyObject_Malloc is quite efficient since it's done from a pool 8 bytes aligned blocks, existing one pool for each block size. There are also Pages and Arenas blocks for bigger allocations.

    This comment on the source code explains how the PyObject_Malloc call is optimized:

    /*
     * The basic blocks are ordered by decreasing execution frequency,
     * which minimizes the number of jumps in the most common cases,
     * improves branching prediction and instruction scheduling (small
     * block allocations typically result in a couple of instructions).
     * Unless the optimizer reorders everything, being too smart...
     */
    

    Pools, Pages and Arenas are optimizations intended to reduce external memory fragmentation of long running Python programs.

    Check out the source code for the full detailed documentation on Python's memory internals.