pythonsegmentation-faultsdl-2skiapysdl2

SDL_BlitSurface in PySDL2 causing segfault on larger surfaces


Background

I am creating a window with pysdl2 and using SDL_Blit_Surface for embedding a skia-python surface inside this window with the following code:

import skia
import sdl2 as sdl
from ctypes import byref as pointer


class Window:
    DEFAULT_FLAGS = sdl.SDL_WINDOW_SHOWN
    BYTE_ORDER = {
        # ---------- ->   RED        GREEN       BLUE        ALPHA
        "BIG_ENDIAN": (0xff000000, 0x00ff0000, 0x0000ff00, 0x000000ff),
        "LIL_ENDIAN": (0x000000ff, 0x0000ff00, 0x00ff0000, 0xff000000)
    }

    PIXEL_DEPTH = 32  # BITS PER PIXEL
    PIXEL_PITCH_FACTOR = 4  # Multiplied by Width to get BYTES PER ROW

    def __init__(self, title, width, height, x=None, y=None, flags=None, handlers=None):
        self.title = bytes(title, "utf8")
        self.width = width
        self.height = height

        # Center Window By default
        self.x, self.y = x, y
        if x is None:
            self.x = sdl.SDL_WINDOWPOS_CENTERED
        if y is None:
            self.y = sdl.SDL_WINDOWPOS_CENTERED

        # Override flags
        self.flags = flags
        if flags is None:
            self.flags = self.DEFAULT_FLAGS

        # Handlers
        self.handlers = handlers
        if self.handlers is None:
            self.handlers = {}

        # SET RGBA MASKS BASED ON BYTE_ORDER
        is_big_endian = sdl.SDL_BYTEORDER == sdl.SDL_BIG_ENDIAN
        self.RGBA_MASKS = self.BYTE_ORDER["BIG_ENDIAN" if is_big_endian else "LIL_ENDIAN"]

        # CALCULATE PIXEL PITCH
        self.PIXEL_PITCH = self.PIXEL_PITCH_FACTOR * self.width

        # SKIA INIT
        self.skia_surface = self.__create_skia_surface()

        # SDL INIT
        sdl.SDL_Init(sdl.SDL_INIT_EVENTS)  # INITIALIZE SDL EVENTS
        self.sdl_window = self.__create_SDL_Window()

    def __create_SDL_Window(self):
        window = sdl.SDL_CreateWindow(
            self.title,
            self.x, self.y,
            self.width, self.height,
            self.flags
        )
        return window

    def __create_skia_surface(self):
        """
        Initializes the main skia surface that will be drawn upon,
        creates a raster surface.
        """
        surface_blueprint = skia.ImageInfo.Make(
            self.width, self.height,
            ct=skia.kRGBA_8888_ColorType,
            at=skia.kUnpremul_AlphaType
        )
        # noinspection PyArgumentList
        surface = skia.Surface.MakeRaster(surface_blueprint)
        return surface

    def __pixels_from_skia_surface(self):
        """
        Converts Skia Surface into a bytes object containing pixel data
        """
        image = self.skia_surface.makeImageSnapshot()
        pixels = image.tobytes()
        return pixels

    def __transform_skia_surface_to_SDL_surface(self):
        """
        Converts Skia Surface to an SDL surface by first converting
        Skia Surface to Pixel Data using .__pixels_from_skia_surface()
        """
        pixels = self.__pixels_from_skia_surface()
        sdl_surface = sdl.SDL_CreateRGBSurfaceFrom(
            pixels,
            self.width, self.height,
            self.PIXEL_DEPTH, self.PIXEL_PITCH,
            *self.RGBA_MASKS
        )
        return sdl_surface

    def update(self):
        window_surface = sdl.SDL_GetWindowSurface(self.sdl_window)  # the SDL surface associated with the window
        transformed_skia_surface = self.__transform_skia_surface_to_SDL_surface()
        # Transfer skia surface to SDL window's surface
        sdl.SDL_BlitSurface(
            transformed_skia_surface, None,
            window_surface, None
        )

        # Update window with new copied data
        sdl.SDL_UpdateWindowSurface(self.sdl_window)

    def event_loop(self):
        handled_events = self.handlers.keys()
        event = sdl.SDL_Event()

        while True:
            sdl.SDL_WaitEvent(pointer(event))

            if event.type == sdl.SDL_QUIT:
                break

            elif event.type in handled_events:
                self.handlers[event.type](event)


if __name__ == "__main__":
    skiaSDLWindow = Window("Browser Test", 500, 500, flags=sdl.SDL_WINDOW_SHOWN | sdl.SDL_WINDOW_RESIZABLE)
    skiaSDLWindow.event_loop()

I monitor my CPU usage for the above code and it stays well below 20% with hardly any change in RAM usage.

Problem

The problem is that as soon I make a window greater than 690 x 549 (or any other size where width and height's products are the same) I get a segfault (core dumped) with CPU usage going upto 100%, no change in RAM usage.

What I have already tried/know

I know the fault is with SDL_BlitSurface as reported by the faulthandler module in python, and the classic print("here") lines.

I am not familiar with languages like c so from my basic understanding of a segfault I tried to match the size of the byte string returned by Window.__pixels_from_skia_surface with sys.getsizeof against C datatypes to see if it was close to the size of any, because I suspected an overflow. (forgive me if this is the stupidest debugging method you have ever seen). But the size didn't come close to any of the c datatypes.


Solution

  • As SDL_CreateRGBSurfaceFrom documentation says, it doesn't allocate memory for pixels data but takes external memory buffer passed to it. While there's a benefit in having no copy operation at all, it have lifetime implications - note "you must free the surface before you free the pixel data".

    Python tracks references for its objects and automatically destroys objects once their reference count reaches 0 (i.e. no references to that object possible - delete it immediately). But nither SDL nor skia are python libraries, and whatever references they keep in their native code is not exposed to python. So, python automatic memory management doesn't help you here.

    What's happening is you get pixels data from skia as bytes array (python object, automatically freed when no longer referenced), then pass it to SDL_CreateRGBSurfaceFrom (native code, python don't know that it'd keep internal reference), and then your pixels goes out of scope and python deletes them. You have surface but SDL says the way you created it pixels must not be destroyed (there are other ways, like SDL_CreateRGBSurface, that actually allocate their own memory). Then you try to blit it and surface still points to location where pixels were, but that array is no longer there.

    [Everything that follows is explaination of why exactly it didn't crash with smaller surface size, and that turned out to require much more words than i thought. Sorry. If you're not interested in that stuff, don't read any further]

    What happens next purely depends on memory allocator used by python. First, segmentation fault is a critical signal sent by operating system to your program, and it happens when you access memory pages in a way that you're not supposed to - e.g. reading memory that have no mapped pages or writing to pages that are mapped as read-only. All that, and the way to map/unmap pages, is provided by your operating system kernel (e.g. in linux it is handled by mmap/munmap calls), but OS kernel only operates on page level; you can't request half-of-page, but you can have large block backed by N pages. For most current operating systems, minimal page size is 4kb; some OS supports 2Mb or even larger 'huge' pages.

    So, you get segmentation fault when you have larger surface, but don't get it when surface is smaller. Meaning for larger surface your BlitSurface hits memory that is already unmapped and OS sends your program polite "sorry can't allow that, correct yourself immediately or you're going down". But when surface is smaller memory that pixels were kept in still mapped; it doesn't necessarily mean it still contains the same data (e.g. python could have placed some other object there), but as far as OS concerned this memory region is still 'yours' to read. And the difference in that behaviour is indeed caused by size of allocated buffer (but of course you can't rely for that behaviour to be kept on other OS, other python versions, or even other with different set of environment variables).

    As i've said before, you only mmap entire pages, but python (that's just an example, as you'll see later) have a lot of smaller objects (integers, floats, smaller strings, short arrays, ...) that are much smaller than a page. Allocating entire page for each of that would be a massive waste of memory (also other problems like reduced performance because of bad caching). To handle that what we do ('we' being every single program that needs smaller allocations, i.e. 99% of programs you use everyday) is allocate a larger block of memory and track which parts of that block is allocated/freed in userspace (as oppoosed to pages that are being tracked by OS kernel - in kernelspace) entirely. That way you could have very tight packing of small allocations without too much of an overhead, but the downside is that this allocations are not distinguishable on OS level. When you 'free' some small allocation that is placed in that kind of pre-allocated block, you just internally mark this region as unused and next time some other part of your program request some memory you start searching for a place where you can put it. It also means you usually don't return (unmap) memory to OS as you can't give back the block if at least one byte of it is still in use.

    Python internally manages small objects (<512b) itself, by allocating 256kb blocks and placing objects in that blocks. If larger allocation is required - it passes it to libc malloc (python itself is written in C and uses libc; the most popular libc for linux is glibc). And malloc documentation for glibc says following:

    When allocating blocks of memory larger than MMAP_THRESHOLD bytes, the glibc malloc() implementation allocates the memory as a private anonymous mapping using mmap(2). MMAP_THRESHOLD is 128 kB by default, but is adjustable using mallopt(3)

    So, allocations for larger objects should go to mmap/munmap, and freeing that pages should make them unaccessible (causing segfault if you try to access it, instead of silently reading potentially garbage data; bonus point if you try to write into it - so-called memory stomping, overwriting something else, probably even internal libc markers that it uses to track which memory is used; anything could happen after that). While there is still a chance that next mmap will randomly place next page on the same address, i'm going to neglect that. Unfortunately this is very old documentation that, while explains basic intent, no longer reclects how glibc behaves nowadays. Take a look at comment in glibc source (emphasis is mine):

    M_MMAP_THRESHOLD is the request size threshold for using mmap()
    to service a request. Requests of at least this size that cannot be allocated using already-existing space will be serviced via mmap.
    (If enough normal freed space already exists it is used instead.)

    ...

    The implementation works with a sliding threshold, which is by default limited to go between 128Kb and 32Mb (64Mb for 64 bitmachines) and starts out at 128Kb as per the 2001 default.

    ...

    The threshold goes up in value when the application frees memory that was allocated with the mmap allocator. The idea is that once the application starts freeing memory of a certain size, it's highly probable that this is a size the application uses for transient allocations.

    So, it tries to adapt to your allocation behaviour to balance performance with releasing memory back to OS.

    But, different OSes will behave differently, and even with just linux we have multiple libc implementations (e.g. musl) that will implement malloc differently, and a lot of different memory allocators (jemalloc, tcmalloc, dlmalloc, you name it) that could be injected via LD_PRELOAD and your program (e.g. python itself in that case) will use different allocator with different rules on mmap usage. There are even debug allocators that injects "guard" pages around every allocation, that don't have any access rights at all (can't read, write or execute), to catch common memory-related programming mistakes, at a cost of massively larger memory usage.

    To sum it up - you had a lifetime management bug in your code, and unfortunately it didn't crash immediately due to internals of libc memory allocation scheme, but it did crash when surface size got larger and libc decided to allocate exclusive pages for that buffer. That is unfortunate turn of events that languages without automatic memory managerment are exposed to, and by virtue of using python C bindings your python program is, to some extent, exposed as well.