I know Singleton is not good and that in general the thread safe lazy initialization in modern C++ can be done by static local variable, but I have quite specific use case and although I can use safer "less-performant" techniques(double check with mutex for the second check), I still would love to verify my understandings on atomic variables and memory ordering.
Also the main reason I cannot use static local variable is because it needs to be shared between DLL/SO boundaries. - Please ignore that statement and just accept that I want to do the initialization myself.
This is the code which came to my mind as the most performant lock-free solution without using stronger memory barriers. It seems to work, but I do wonder if that is really the case.
Is my usage of memory ordering correct here? Will it have the expected behavior?
// in header file
class DLL_EXPORT MyLogger
{
public:
void logMessage(std::string message);
private:
static std::atomic<MyLogger*> sSharedInstance;
}
// in cpp file
constinit std::atomic<MyLogger*> MyLogger::sSharedInstance{nullptr};
MyLogger& MyLogger::sharedInstance()
{
MyLogger* logger = sSharedInstance.load(std::memory_order_relaxed);
if (!logger)
{
logger = new MyLogger(); // The constructor is no-op, so redundant instances are not an issue.
MyLogger* previousLogger = nullptr;
if (!sSharedInstance.compare_exchange_strong(previousLogger, logger, std::memory_order_relaxed, std::memory_order_relaxed))
{
// Lost the init race - destroy the redundant logger
delete logger;
logger = previousLogger;
}
}
return *logger;
}
// in different threads
void someFunction();
{
// Can be called from many threads...
MyLogger::sharedInstance().logMessage("some message");
}
After tons of research and discussion and thanks to the answer of @user17732522 and the really good comments from @PeterCordes I finally understand the issues in the code from my question. There are two changes which have to be made. The load operation have to become std::memory_order_consume
and the third parameter(success case) of compare_exchange_strong
call should become std::memory_order_release
.
I will try to explain as much as I can why and how things happens here, but first lets establish some base ground:
An example issue about the compare_exchange_strong
in my question is that the compiler can reorder the object creation(as @user17732522 pointed out). So imagine the following theoretical scenario:
logger
variable. However, the constructor still might not be called as the compiler and/or CPU can reorder that operation, because of the relaxed semantics and because the compare_exchange_strong
only rely on the pointer address, not on its object state.To the rescue comes std::memory_order_consume
for the load operation and std::memory_order_release
for the success case of the compare-exchange. The release semantics make it so no operations can be reordered after it, thus guaranteeing that both the local variable logger
gets the memory address and the construction of the MyLogger
instance is completed, thus when the pointer address is written to sSharedInstance
by the compare_exchange_strong()
call it is guaranteed that the object will be in a fully constructed state.
We can leave the failure case of compare_exchange_strong()
to std::memory_order_relaxed
, because it uses a local stack variable which of course can be seen only by a single thread.
Now, initially I thought that at least the load operation can be kept with relaxed memory ordering, but thanks to @PeterCordes now I understand that due to some specific CPUs(notably DEC Alpha), even though the constuctor have finished execution, before the assignment of the new address to sSharedInstance
another thread might be able to see the new pointer address but still not observe the object values which were changed by the MyLogger() constuctor... This is why for the load operation at least consume semantics are needed.
NOTE: std::memory_order_consume
is temporary discouraged by the standard due to its complicated nature. Most compilers automatically promote it to std::memory_order_acquire
, but in order to have extra safety, it would be better to use std::memory_order_acquire
in the code too. For more info. check out this link: https://en.cppreference.com/w/cpp/atomic/memory_order#Release-Consume_ordering