What is the right way to write double-checked locking in Rust?

I found this article, but it looks wrong because Cell does not guarantee a synchronization between the set() under a lock and the get() over the lock.

Does Atomic_.store(true, Ordering::Release) affect other non-atomic write operations?

I tried to write it with AtomicPtr which looks close to Java-style but it failed. I couldn't find examples of the correct using of AtomicPtr in such cases.

Solution

Does Atomic_.store(true, Ordering::Release) affect other non-atomic write operations?

Yes.

Actually, the primary reason Ordering exists is to impose some ordering guarantees on non-atomic reads and writes:

within a same thread of execution, for both compiler and CPU,
so that other threads have guarantees in the order in which they will see the changes.

Relaxed

The less constraining Ordering; the only operations which cannot be reordered are operations on the same atomic value:

atomic.set(4, Ordering::Relaxed);
other = 8;
println!("{}", atomic.get(Ordering::Relaxed));

is guaranteed to print 4. If another thread reads that atomic is 4, it has no guarantee about whether other is 8 or not.

Release/Acquire

Write and read barriers, respectively:

Release is to be used with store operations, and guarantees that prior writes are executed,
Acquire is to be used with load operations, and guarantees that further reads will see values at least as fresh as the ones written prior to the corresponding store.

So:

// thread 1
one = 1;
atomic.set(true, Ordering::Release);
two = 2;

// thread 2
while !atomic.get(Ordering::Acquire) {}

println!("{} {}", one, two);

guarantees that one is 1, and says nothing about two.

Note that a Relaxed store with an Acquire load or a Release store with a Relaxed load are essentially meaningless.

Note that Rust provides AcqRel: it behaves as Release for stores and Acquire for loads, so you don't have to remember which is which... I do not recommend it though, since the guarantees provided are so different.

SeqCst

The most constraining Ordering. Guarantees ordering across all threads at once.

What is the right way to write double-checked locking in Rust?

So, double-checked locking is about taking advantage of those atomic operations to avoid locking when unnecessary.

The idea is to have 3 pieces:

a flag, initially false, and true once the action has been executed,
a mutex, to guarantee exclusion during initialization,
a value, to be initialized.

And use them as such:

if the flag is true, value already initialized,
otherwise, lock the mutex,
if the flag still false: initialize and set the flag to true,
release the lock, the value is now initialized.

The difficulty is ensuring that the non-atomic reads/writes are correctly ordered (and become visible in the correct order). In theory, you would need full fences for that; in practice following the idioms of the C11/C++11 memory models will be sufficient since compilers must make it work.

Let's examine the code first (simplified):

struct Lazy<T> {
    initialized: AtomicBool,
    lock: Mutex<()>,
    value: UnsafeCell<Option<T>>,
}

impl<T> Lazy<T> {
    pub fn get_or_create<'a, F>(&'a self, f: F) -> &'a T
    where
        F: FnOnce() -> T
    {
        if !self.initialized.load(Ordering::Acquire) { // (1)
            let _lock = self.lock.lock().unwrap();

            if !self.initialized.load(Ordering::Relaxed) { // (2)
                let value = unsafe { &mut *self.value.get() };
                *value = Some(f(value));
                self.initialized.store(true, Ordering::Release); // (3)
            }
        }

        unsafe { &*self.value.get() }.as_ref().unwrap()
    }
}

There are 3 atomic operations, numbered via comments. We can now check which kind of guarantee on memory ordering each must provide for correctness.

(1) if true, a reference to the value is returned, which must reference valid memory. This requires that the writes to this memory be executed before the atomic turns true, and the reads of this memory be executed only after it is true. Thus (1) requires Acquire and (3) requires Release.

(2) on the other hand has no such constraint because locking a Mutex is equivalent to a full memory barrier: all writes are guaranteed to have occured before and all reads will only occur after. As such, there is no further guarantee needed for this load, so Relaxed is the most optimized.

Thus, as far as I am concerned, this implementation of double-checking looks correct in practice.

For further reading, I really recommend the article by Preshing which is linked in the piece you linked. It notably highlights the difference between the theory (fences) and practice (atomic loads/stores which are lowered to fences).