c++gccpthreadsclangstdmutex

Why do functions using std::mutex make a null check of the address of pthread_key_create?


Take this simple function that increments an integer under a lock implemented by std::mutex:

#include <mutex>

std::mutex m;

void inc(int& i) {
    std::unique_lock<std::mutex> lock(m);
    i++;
}

I would expect this (after inlining) to compile in a straightforward way to a call of m.lock() an increment of i and then m.unlock().

Checking the generated assembly for recent versions of gcc and clang, however, we see an extra complication. Taking the gcc version first:

inc(int&):
  mov eax, OFFSET FLAT:__gthrw___pthread_key_create(unsigned int*, void (*)(void*))
  test rax, rax
  je .L2
  push rbx
  mov rbx, rdi
  mov edi, OFFSET FLAT:m
  call __gthrw_pthread_mutex_lock(pthread_mutex_t*)
  test eax, eax
  jne .L10
  add DWORD PTR [rbx], 1
  mov edi, OFFSET FLAT:m
  pop rbx
  jmp __gthrw_pthread_mutex_unlock(pthread_mutex_t*)
.L2:
  add DWORD PTR [rdi], 1
  ret
.L10:
  mov edi, eax
  call std::__throw_system_error(int)

It's the first couple of lines that are interesting. The assembled code examines the address of __gthrw___pthread_key_create (which is the implementation for pthread_key_create - a function to create a thread-local storage key), and if it is zero, it branches to .L2 which implements the increment in a single instruction without any locking at all.

If it is non-zero it proceeds as expected: locking the mutex, doing the increment, and unlocking.

clang does even more: it checks the address of the function twice, once before the lock and once before the unlock:

inc(int&): # @inc(int&)
  push rbx
  mov rbx, rdi
  mov eax, __pthread_key_create
  test rax, rax
  je .LBB0_4
  mov edi, m
  call pthread_mutex_lock
  test eax, eax
  jne .LBB0_6
  inc dword ptr [rbx]
  mov eax, __pthread_key_create
  test rax, rax
  je .LBB0_5
  mov edi, m
  pop rbx
  jmp pthread_mutex_unlock # TAILCALL
.LBB0_4:
  inc dword ptr [rbx]
.LBB0_5:
  pop rbx
  ret
.LBB0_6:
  mov edi, eax
  call std::__throw_system_error(int)

What's the purpose of this check?

Perhaps it is to support the case where the object file is ultimately complied into a binary without pthreads support and then to fall back to a version without locking in that case? I couldn't find any documentation on this behavior.


Solution

  • Your guess looks to be correct. From the libgcc/gthr-posix.h file in gcc's source repository (https://github.com/gcc-mirror/gcc.git):

    /* For a program to be multi-threaded the only thing that it certainly must
       be using is pthread_create.  However, there may be other libraries that
       intercept pthread_create with their own definitions to wrap pthreads
       functionality for some purpose.  In those cases, pthread_create being
       defined might not necessarily mean that libpthread is actually linked
       in.
    
       For the GNU C library, we can use a known internal name.  This is always
       available in the ABI, but no other library would define it.  That is
       ideal, since any public pthread function might be intercepted just as
       pthread_create might be.  __pthread_key_create is an "internal"
       implementation symbol, but it is part of the public exported ABI.  Also,
       it's among the symbols that the static libpthread.a always links in
       whenever pthread_create is used, so there is no danger of a false
       negative result in any statically-linked, multi-threaded program.
    
       For others, we choose pthread_cancel as a function that seems unlikely
       to be redefined by an interceptor library.  The bionic (Android) C
       library does not provide pthread_cancel, so we do use pthread_create
       there (and interceptor libraries lose).  */
    
    #ifdef __GLIBC__
    __gthrw2(__gthrw_(__pthread_key_create),
         __pthread_key_create,
         pthread_key_create)
    # define GTHR_ACTIVE_PROXY  __gthrw_(__pthread_key_create)
    #elif defined (__BIONIC__)
    # define GTHR_ACTIVE_PROXY  __gthrw_(pthread_create)
    #else
    # define GTHR_ACTIVE_PROXY  __gthrw_(pthread_cancel)
    #endif
    
    static inline int
    __gthread_active_p (void)
    {
      static void *const __gthread_active_ptr
        = __extension__ (void *) &GTHR_ACTIVE_PROXY;
      return __gthread_active_ptr != 0;
    }
    

    Then throughout the remainder of the file many of the pthread APIs are wrapped inside checks to the __gthread_active_p() function. If __gthread_active_p() returns 0 nothing is done and success is returned.