c++c++17thread-safetyinline

How does C++17 handle thread-safe initialization of inline static data members across multiple translation units?


I've been learning about inline static variables in C++17 and their initialization across multiple translation units. I understand that inline static variables are introduced to maintain a single instance across the program while ensuring thread-safe initialization. However, when you use inline static variables, the compiler seems to add a check to ensure that the variable is initialized in a thread-safe manner. This is necessary because the compiler cannot predict which translation unit will perform the initialization first, due to the inclusion of the header file in multiple translation units.

From what I've gathered, there seems to be some sort of guard mechanism (as seen in the assembly code linked below) that prevents multiple initializations.

So my questions are:

1)In each translation unit, is there a thread-safety guard for initialization during runtime?

2)After the variable is initialized, is it correct that no synchronization primitives are needed when it is accessed/used throughout the program?

https://godbolt.org/z/hhWoK7Y3r

#ifndef THING_H
#define THING_H
#include <string>
class Thing
{
public:
    inline const static std::string name{"Miko"};     
};
#endif // THING_H

Solution

  • The guard variable is used to ensure the constructor is called exactly once when linking multiple translation units. It is NOT used to ensure thread-safe initialization.

    1. In each translation unit, is there a thread-safety guard for initialization during runtime?

    Global variables (namespace-scoped or class static data member) are initialized sequentially in main thread, before main() starts. So you do not need to worry about thread safety before main() starts.

    Well, technically you can start a thread in the constructor of a global variable. Please don't do it. If you do so, you should already know what you are doing and well aware of potential consequences.

    1. After the variable is initialized, is it correct that no synchronization primitives are needed when it is accessed/used throughout the program?

    When main() starts, it's guaranteed that all global variables are initialized. Whether it's safe to use these variables without synchronization or not depends on their implementation. In general, for STL types, it should be safe to read concurrently from multiple threads without additional synchronizations.


    I'm going to dive a little deeper into the implementation details here. Let say we have the setup with the following files:

    test_inline.h

    #pragma once
    
    struct Foo {
        Foo();
    };
    
    struct Thing {
        inline const static Foo foo_static;
    };
    

    a.cxx

    #include "test_inline.h"
    

    b.cxx

    #include "test_inline.h"
    

    main.cxx

    #include "test_inline.h"
    Foo::Foo() {}
    int main() {}
    

    And compile with

    g++ -std=c++23 -O2 -c a.cxx
    g++ -std=c++23 -O2 -c b.cxx
    g++ -std=c++23 -O2 a.o b.o main.cxx -o main_exe
    

    When libc starts up, it will loop through and invoke all function pointers in .init_array section. And the final .init_array section in an executable file is gathered from all object files by the linker.

    If you inspect a.o (compiled from a.cxx TU), you will see there is one pointer to .text.startup+0. This is generated behind-the-scene by the compiler to intialize foo_static.

    ❯ objdump -s -j .init_array a.o -r
    RELOCATION RECORDS FOR [.init_array]:
    OFFSET           TYPE              VALUE
    0000000000000000 R_X86_64_64       .text.startup
    
    Contents of section .init_array:
     0000 00000000 00000000                    ........
    

    Let's see .text.startup+0 function:

    ❯ objdump a.o -C -M intel -dr
    
    a.o:     file format elf64-x86-64
    
    
    Disassembly of section .text.startup:
    
    0000000000000000 <_GLOBAL__sub_I_a.cxx>:
       0:   f3 0f 1e fa             endbr64
       4:   80 3d 00 00 00 00 00    cmp    BYTE PTR [rip+0x0],0x0        # b <_GLOBAL__sub_I_a.cxx+0xb>
                            6: R_X86_64_PC32        guard variable for Thing::foo_static-0x5
       b:   74 01                   je     e <_GLOBAL__sub_I_a.cxx+0xe>
       d:   c3                      ret
       e:   48 8d 3d 00 00 00 00    lea    rdi,[rip+0x0]        # 15 <_GLOBAL__sub_I_a.cxx+0x15>
                            11: R_X86_64_PC32       Thing::foo_static-0x4
      15:   c6 05 00 00 00 00 01    mov    BYTE PTR [rip+0x0],0x1        # 1c <_GLOBAL__sub_I_a.cxx+0x1c>
                            17: R_X86_64_PC32       guard variable for Thing::foo_static-0x5
      1c:   e9 00 00 00 00          jmp    21 <_GLOBAL__sub_I_a.cxx+0x21>
                            1d: R_X86_64_PLT32      Foo::Foo()-0x4
    

    It basically says

    if (__guard_variable_for_foo_static == 0) {
        __guard_variable_for_foo_static = 1;
        new (&foo_static) Foo; // construct foo_static using Foo::Foo() constructor
    }
    

    And if we look at the symbol table, both the guard variable and foo_static variable has u flag, which means the linker will deduplicate the symbol at link time.

    ❯ objdump -C -t a.o
    ...
    0000000000000000 u     O .bss._ZGVN5Thing10foo_staticE  0000000000000008 guard variable for Thing::foo_static
    0000000000000000 u     O .bss._ZN5Thing10foo_staticE    0000000000000001 Thing::foo_static
    

    b.o (compiled from b.cxx) should be exactly the same as a.o, since a.cxx and b.cxx are identical.


    Now finally, we can look at the final main_exe executable.

    There is indeed only one copy of the guard variable and foo_static in the executable.

    ❯ objdump -C -t main_exe
    ...
    0000000000004020 u     O .bss   0000000000000001              Thing::foo_static
    0000000000004018 u     O .bss   0000000000000008              guard variable for Thing::foo_static
    

    However, if you look at .init_array in main_exe

    ❯ objdump -s -j .init_array main_exe -r
    Contents of section .init_array:
     3dd8 b0110000 00000000 40100000 00000000  ........@.......
     3de8 70100000 00000000 b0100000 00000000  p...............
    

    There are 4 function pointers here. We focus on the 2nd and 3rd one for now (0x1040 and 0x1070).

    ❯ objdump main_exe -C -M intel -dr
    
    0000000000001040 <_GLOBAL__sub_I_a.cxx>:
        1040:       f3 0f 1e fa             endbr64
        1044:       80 3d cd 2f 00 00 00    cmp    BYTE PTR [rip+0x2fcd],0x0        # 4018 <guard variable for Thing::foo_static>
        104b:       74 01                   je     104e <_GLOBAL__sub_I_a.cxx+0xe>
        104d:       c3                      ret
        104e:       48 8d 3d cb 2f 00 00    lea    rdi,[rip+0x2fcb]        # 4020 <Thing::foo_static>
        1055:       c6 05 bc 2f 00 00 01    mov    BYTE PTR [rip+0x2fbc],0x1        # 4018 <guard variable for Thing::foo_static>
        105c:       e9 5f 01 00 00          jmp    11c0 <Foo::Foo()>
        1061:       66 2e 0f 1f 84 00 00    cs nop WORD PTR [rax+rax*1+0x0]
        1068:       00 00 00
        106b:       0f 1f 44 00 00          nop    DWORD PTR [rax+rax*1+0x0]
    
    0000000000001070 <_GLOBAL__sub_I_b.cxx>:
        1070:       f3 0f 1e fa             endbr64
        1074:       80 3d 9d 2f 00 00 00    cmp    BYTE PTR [rip+0x2f9d],0x0        # 4018 <guard variable for Thing::foo_static>
        107b:       74 01                   je     107e <_GLOBAL__sub_I_b.cxx+0xe>
        107d:       c3                      ret
        107e:       48 8d 3d 9b 2f 00 00    lea    rdi,[rip+0x2f9b]        # 4020 <Thing::foo_static>
        1085:       c6 05 8c 2f 00 00 01    mov    BYTE PTR [rip+0x2f8c],0x1        # 4018 <guard variable for Thing::foo_static>
        108c:       e9 2f 01 00 00          jmp    11c0 <Foo::Foo()>
        1091:       66 2e 0f 1f 84 00 00    cs nop WORD PTR [rax+rax*1+0x0]
        1098:       00 00 00
        109b:       0f 1f 44 00 00          nop    DWORD PTR [rax+rax*1+0x0]
    

    There are two duplicated copies of the hidden intialization function, one comes from a.o and the other comes from b.o.

    So the linker can deduplicate the symbols, but not .init_array content. This explains why we need to check for the guard variable to avoid double initialization.

    The same mechanism also works with when linking dynamically (shared library).