linuxopenglegl

EGL Display Handle Lifetime vs Static Object Lifetime


I'm debugging a segfault-on-exit that I suspect comes to a recent change to our OpenGL-via-EGL off-screen rendering setup. Everything running on ubuntu with integrated intel graphics. The segfault only appears when OpenCV's highgui is included and used. This is probably only the canary in the coalmine so to speak, and the issue seems to stem from object lifetimes.

We have an EGLContext class that manages all things EGL. That means choosing a /dev/dri/xyz file handle for which device to use, and then getting a display handle. We use the extensions EGL_EXT_platform_base, EGL_EXT_platform_device, EGL_EXT_device_base, EGL_EXT_device_query and EGL_EXT_device_enumeration.

EGLDisplay Handle Lifetime

This EGLContext used to work fine, however since a recent optimization, it is owned by a static object, and thus destructed much later than before. In the destructor we call eglDestroyContext() and eglTerminate(), which both give back an EGL_BAD_DISPLAY error. Is there an inherent lifetime to these handles, that is shorter than that of static objects? Nowhere in our code do we destroy the display connection before this moment.

EGL_PLATFORM_DEVICE_EXT display handle

While debugging the above issue, I noticed that we get different display handles on each call to eglGetPlatformDisplayEXT(EGL_PLATFORM_DEVICE_EXT, devicedrm, display_attr). The extension docs state that

Multiple calls made to eglGetPlatformDisplayEXT with the same <platform> and <native_display> will return the same EGLDisplay handle.

Anyone knows what could be the cause of this? Am I missing something?


Solution

  • I found the answer to my problem. Here you go, future googler, if you stumble across the same issue.

    Thanks to the docs, I found out about EGL_LOG_LEVEL. Setting this to debug showed me, that indeed, my display was being terminated while I still wanted to use it.

    Googling led me to the mesa source, which showed me that an atexit callback of mesa was being called, which terminated my display.

    As much as I was in disbelief, apparently mesa exits before our static object is destructed, which must mean that mesa is initialized after our static object. If our EGLContext were still a normal object, it would have been destructed long before the atexit callbacks. However, since it's now static, it is in direct "competition".

    The solution I still have to implement, but it will either have something to do with initializing EGL much earlier, or with registering my own atexit call. Or something I can't think of yet...

    Edit: Solution was to initialize our EGL things before the static object is constructed. Mesa registers its atexit handlers in a few functions, among them eglBindAPI and eglMakeCurrent