c++c++11openglvalgrindnvidia

Valgrind: libnvidia-glcore.so.346.47 Conditional jump or move depends on uninitialised value


When running my test c++ app against my dynamic library which links against NVIDIA's libGL.so I am getting the following errors (see below) reported by Valgrind. I am tempted to suppress them, but I am not sure if this is my issue or something libnvidia-glcore.so has. Part of the unsurety stems form not fully understanding Valgrind's output. I have looked into what variables might be uninitialized in my code in the call to glXCreateContextAttribsARB but I do not see any there. If it appears from the output to by my issue what types of things am I looking for? The two errors I am getting are:

==10156== Conditional jump or move depends on uninitialised value(s)
==10156==    at 0x7E4CAF4: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7DEE0CD: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7DEEADC: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7F75DA1: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7F775D3: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7E279BE: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7E27D21: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7F760F5: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7F3E353: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7A8C9C0: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x4E535F2: opengl_core::render_system::init() (x11_render_system.cpp:92)
==10156==    by 0x4040D8: test_render_system::run() (test_x11_render_system.cpp:10)
==10156==  Uninitialised value was created by a heap allocation
==10156==    at 0x4C29BCF: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==10156==    by 0x5116428: ??? (in /usr/lib64/nvidia/libGL.so.346.47)
==10156==    by 0x7EECF2E: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7E479C1: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7DC8C31: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x50BF331: ??? (in /usr/lib64/nvidia/libGL.so.346.47)
==10156==    by 0x50EB72A: ??? (in /usr/lib64/nvidia/libGL.so.346.47)
==10156==    by 0x50EEA87: ??? (in /usr/lib64/nvidia/libGL.so.346.47)
==10156==    by 0x50E47D2: glXCreateContextAttribsARB (in /usr/lib64/nvidia/libGL.so.346.47)
==10156==    by 0x4E52EF8: opengl_core::render_context::init(opengl_core::render_window&, opengl_core::fb_config&) (x11_render_context.cpp:120)
==10156==    by 0x4E534D0: opengl_core::render_system::init() (x11_render_system.cpp:65)
==10156==    by 0x4040D8: test_render_system::run() (test_x11_render_system.cpp:10)
==10156== 

==10156== Conditional jump or move depends on uninitialised value(s)
==10156==    at 0x7E4CAF4: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7DEE0CD: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7DF085F: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7F4B78B: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7F4CFBC: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7E279BE: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7E27D21: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7F4BFE0: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7F38ED5: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7B20F52: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7F3E2CB: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7A8C9C0: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==  Uninitialised value was created by a heap allocation
==10156==    at 0x4C29BCF: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==10156==    by 0x5116428: ??? (in /usr/lib64/nvidia/libGL.so.346.47)
==10156==    by 0x7EECF2E: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7E479C1: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7DC8C31: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x50BF331: ??? (in /usr/lib64/nvidia/libGL.so.346.47)
==10156==    by 0x50EB72A: ??? (in /usr/lib64/nvidia/libGL.so.346.47)
==10156==    by 0x50EEA87: ??? (in /usr/lib64/nvidia/libGL.so.346.47)
==10156==    by 0x50E47D2: glXCreateContextAttribsARB (in /usr/lib64/nvidia/libGL.so.346.47)
==10156==    by 0x4E52EF8: opengl_core::render_context::init(opengl_core::render_window&, opengl_core::fb_config&) (x11_render_context.cpp:120)
==10156==    by 0x4E534D0: opengl_core::render_system::init() (x11_render_system.cpp:65)
==10156==    by 0x4040D8: test_render_system::run() (test_x11_render_system.cpp:10)
==10156== 

As per request:

 // src/x11_render_system.cpp
 91       m_impl->m_context.make_current(m_impl->m_window);
 92       glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
 93       glClearColor(1.0, 0.0, 0.0, 1.0);  
 94       glXSwapBuffers(display, window);   
 95       m_impl->m_context.make_not_current();

Solution

  • Valgrind is quite prone to false positives with critical hardware drivers (such as GPU drivers) due to the way they work. Basically, these drivers access the GPU's memory (and even registers) through user space (virtual RAM) which is setup by the BIOS (this is POSIX mmap at work). This way, the driver can access device's registers through arbitrary addresses, like any other variable.

    The point is that some device's registers are only meant to be read. For example, they could reflect some status of the device. Thus, only the device have a reason to write them (and even if the CPU tried to do that, it would fail silently). Most of the time, it does so internally at power up, and from time to time when status change, and it reflects to user space when mapping is setup. In essence, these are pure volatile variables... even more volatile than the usual thread to thread conception of it, which by the way is well handled by Valgrind since it emulates CPU.

    But Valgrind lives in a determinist world (CPU and RAM) and these GPU's registers are completely out of this world. When the driver reads them, Valgrind simply thinks it is accessing RAM (due to mmap), which is definitely not true. Thus, at the point the driver uses the read data to branch accordingly, Valgrind reports because nothing in its world ever wrote this data.

    Let's be honest: proprietary drivers are not open-source, so it's hard to guess what is really happening, but it is likely something similar. What I can tell for sure is that this is happening with Valgrind and GPU drivers since ages (even with very small programs), mainly during initializations and everybody agrees these are false positives. Thus, you can safely ignore them... or create a suppression file for Valgrind in your project (let's name it valgrind.supp):

    {
      NVidia-driver
      Memcheck:Cond
      obj:/usr/lib64/nvidia/libnvidia-glcore.so.346.47
    }
    

    Then you call Valgrind with the option --suppressions=valgrind.supp and it will no longer report these false positives.

    You may have other driver objects related to this, just add entries for them (you'll have to repeat the whole {...} and modify the object line to match what Valgrind reports). You may also have to update them everytime you update your driver since the version changes, though I guess you can use basic wildcards to avoid this.

    Take a look here for more infos on this Valgrind feature.