I've been learning about inline
static variables in C++17
and their initialization across multiple translation units. I understand that inline static variables are introduced to maintain a single instance across the program while ensuring thread-safe initialization. However, when you use inline static variables, the compiler seems to add a check to ensure that the variable is initialized in a thread-safe manner. This is necessary because the compiler cannot predict which translation unit will perform the initialization first, due to the inclusion of the header file in multiple translation units.
From what I've gathered, there seems to be some sort of guard mechanism (as seen in the assembly code linked below) that prevents multiple initializations.
So my questions are:
1)In each translation unit, is there a thread-safety guard for initialization during runtime?
2)After the variable is initialized, is it correct that no synchronization primitives are needed when it is accessed/used throughout the program?
https://godbolt.org/z/hhWoK7Y3r
#ifndef THING_H
#define THING_H
#include <string>
class Thing
{
public:
inline const static std::string name{"Miko"};
};
#endif // THING_H
The guard variable is used to ensure the constructor is called exactly once when linking multiple translation units. It is NOT used to ensure thread-safe initialization.
- In each translation unit, is there a thread-safety guard for initialization during runtime?
Global variables (namespace-scoped or class static data member) are initialized sequentially in main thread, before main()
starts. So you do not need to worry about thread safety before main()
starts.
Well, technically you can start a thread in the constructor of a global variable. Please don't do it. If you do so, you should already know what you are doing and well aware of potential consequences.
- After the variable is initialized, is it correct that no synchronization primitives are needed when it is accessed/used throughout the program?
When main()
starts, it's guaranteed that all global variables are initialized.
Whether it's safe to use these variables without synchronization or not depends on their implementation.
In general, for STL types, it should be safe to read concurrently from multiple
threads without additional synchronizations.
I'm going to dive a little deeper into the implementation details here. Let say we have the setup with the following files:
test_inline.h
#pragma once
struct Foo {
Foo();
};
struct Thing {
inline const static Foo foo_static;
};
a.cxx
#include "test_inline.h"
b.cxx
#include "test_inline.h"
main.cxx
#include "test_inline.h"
Foo::Foo() {}
int main() {}
And compile with
g++ -std=c++23 -O2 -c a.cxx
g++ -std=c++23 -O2 -c b.cxx
g++ -std=c++23 -O2 a.o b.o main.cxx -o main_exe
When libc starts up, it will loop through and invoke all function pointers in .init_array
section. And the final .init_array
section in an executable file is gathered from all object files by the linker.
If you inspect a.o
(compiled from a.cxx
TU), you will see there is one pointer to .text.startup+0
. This is generated behind-the-scene by the compiler to intialize foo_static
.
❯ objdump -s -j .init_array a.o -r
RELOCATION RECORDS FOR [.init_array]:
OFFSET TYPE VALUE
0000000000000000 R_X86_64_64 .text.startup
Contents of section .init_array:
0000 00000000 00000000 ........
Let's see .text.startup+0
function:
❯ objdump a.o -C -M intel -dr
a.o: file format elf64-x86-64
Disassembly of section .text.startup:
0000000000000000 <_GLOBAL__sub_I_a.cxx>:
0: f3 0f 1e fa endbr64
4: 80 3d 00 00 00 00 00 cmp BYTE PTR [rip+0x0],0x0 # b <_GLOBAL__sub_I_a.cxx+0xb>
6: R_X86_64_PC32 guard variable for Thing::foo_static-0x5
b: 74 01 je e <_GLOBAL__sub_I_a.cxx+0xe>
d: c3 ret
e: 48 8d 3d 00 00 00 00 lea rdi,[rip+0x0] # 15 <_GLOBAL__sub_I_a.cxx+0x15>
11: R_X86_64_PC32 Thing::foo_static-0x4
15: c6 05 00 00 00 00 01 mov BYTE PTR [rip+0x0],0x1 # 1c <_GLOBAL__sub_I_a.cxx+0x1c>
17: R_X86_64_PC32 guard variable for Thing::foo_static-0x5
1c: e9 00 00 00 00 jmp 21 <_GLOBAL__sub_I_a.cxx+0x21>
1d: R_X86_64_PLT32 Foo::Foo()-0x4
It basically says
if (__guard_variable_for_foo_static == 0) {
__guard_variable_for_foo_static = 1;
new (&foo_static) Foo; // construct foo_static using Foo::Foo() constructor
}
And if we look at the symbol table, both the guard variable and foo_static
variable has u
flag, which means the linker will deduplicate the symbol at link time.
❯ objdump -C -t a.o
...
0000000000000000 u O .bss._ZGVN5Thing10foo_staticE 0000000000000008 guard variable for Thing::foo_static
0000000000000000 u O .bss._ZN5Thing10foo_staticE 0000000000000001 Thing::foo_static
b.o
(compiled from b.cxx
) should be exactly the same as a.o
, since a.cxx
and b.cxx
are identical.
Now finally, we can look at the final main_exe
executable.
There is indeed only one copy of the guard variable and foo_static
in the executable.
❯ objdump -C -t main_exe
...
0000000000004020 u O .bss 0000000000000001 Thing::foo_static
0000000000004018 u O .bss 0000000000000008 guard variable for Thing::foo_static
However, if you look at .init_array
in main_exe
❯ objdump -s -j .init_array main_exe -r
Contents of section .init_array:
3dd8 b0110000 00000000 40100000 00000000 ........@.......
3de8 70100000 00000000 b0100000 00000000 p...............
There are 4 function pointers here. We focus on the 2nd and 3rd one for now (0x1040 and 0x1070).
❯ objdump main_exe -C -M intel -dr
0000000000001040 <_GLOBAL__sub_I_a.cxx>:
1040: f3 0f 1e fa endbr64
1044: 80 3d cd 2f 00 00 00 cmp BYTE PTR [rip+0x2fcd],0x0 # 4018 <guard variable for Thing::foo_static>
104b: 74 01 je 104e <_GLOBAL__sub_I_a.cxx+0xe>
104d: c3 ret
104e: 48 8d 3d cb 2f 00 00 lea rdi,[rip+0x2fcb] # 4020 <Thing::foo_static>
1055: c6 05 bc 2f 00 00 01 mov BYTE PTR [rip+0x2fbc],0x1 # 4018 <guard variable for Thing::foo_static>
105c: e9 5f 01 00 00 jmp 11c0 <Foo::Foo()>
1061: 66 2e 0f 1f 84 00 00 cs nop WORD PTR [rax+rax*1+0x0]
1068: 00 00 00
106b: 0f 1f 44 00 00 nop DWORD PTR [rax+rax*1+0x0]
0000000000001070 <_GLOBAL__sub_I_b.cxx>:
1070: f3 0f 1e fa endbr64
1074: 80 3d 9d 2f 00 00 00 cmp BYTE PTR [rip+0x2f9d],0x0 # 4018 <guard variable for Thing::foo_static>
107b: 74 01 je 107e <_GLOBAL__sub_I_b.cxx+0xe>
107d: c3 ret
107e: 48 8d 3d 9b 2f 00 00 lea rdi,[rip+0x2f9b] # 4020 <Thing::foo_static>
1085: c6 05 8c 2f 00 00 01 mov BYTE PTR [rip+0x2f8c],0x1 # 4018 <guard variable for Thing::foo_static>
108c: e9 2f 01 00 00 jmp 11c0 <Foo::Foo()>
1091: 66 2e 0f 1f 84 00 00 cs nop WORD PTR [rax+rax*1+0x0]
1098: 00 00 00
109b: 0f 1f 44 00 00 nop DWORD PTR [rax+rax*1+0x0]
There are two duplicated copies of the hidden intialization function, one comes from a.o
and the other comes from b.o
.
So the linker can deduplicate the symbols, but not .init_array
content. This explains why we need to check for the guard variable to avoid double initialization.
The same mechanism also works with when linking dynamically (shared library).