Do atomic operations in Go make sure other variables are visible to other threads?

it make me confused, i reading golang memory model, https://golang.org/ref/mem

var l sync.Mutex
var a string

func f() {
    a = "hello, world"
    l.Unlock()
}

func main() {
    l.Lock()
    go f()
    l.Lock()
    print(a)
}

Mutex Lock Unlock by atomic

UnLock: new := atomic.AddInt32(&m.state, -mutexLocked)

Lock: atomic.CompareAndSwapInt32(&m.state, 0, mutexLocked)

my question is, if atomic AddInt32, CompareAndSwapInt32 will cause memory barriers, if a will be visible in different goroutines.

In java, I know AtomicInteger, memory barriers by "volatile", keep thread field visible.

Solution

Test program:

package main

import (
    "sync/atomic"
)

var n uint32

func main() {
    n = 100
    atomic.AddUint32(&n, 1)
}

Check the assembly by:

go tool compile -S main.go         
"".main STEXT nosplit size=27 args=0x0 locals=0x0 funcid=0x0
    0x0000 00000 (main.go:9)    TEXT    "".main(SB), NOSPLIT|ABIInternal, $0-0
    0x0000 00000 (main.go:9)    FUNCDATA    $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
    0x0000 00000 (main.go:9)    FUNCDATA    $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
    0x0000 00000 (main.go:10)   MOVL    $100, "".n(SB)
    0x000a 00010 (main.go:11)   MOVL    $1, AX
    0x000f 00015 (main.go:11)   LEAQ    "".n(SB), CX
    0x0016 00022 (main.go:11)   LOCK
    0x0017 00023 (main.go:11)   XADDL   AX, (CX)
    0x001a 00026 (main.go:12)   RET
    0x0000 c7 05 00 00 00 00 64 00 00 00 b8 01 00 00 00 48  ......d........H
    0x0010 8d 0d 00 00 00 00 f0 0f c1 01 c3                 ...........
    rel 2+4 t=15 "".n+-4
    rel 18+4 t=15 "".n+0
go.cuinfo.packagename. SDWARFCUINFO dupok size=0
    0x0000 6d 61 69 6e                                      main
""..inittask SNOPTRDATA size=24
    0x0000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    0x0010 00 00 00 00 00 00 00 00                          ........
"".n SNOPTRBSS size=4
type..importpath.sync/atomic. SRODATA dupok size=13
    0x0000 00 0b 73 79 6e 63 2f 61 74 6f 6d 69 63           ..sync/atomic
gclocals·33cdeccccebe80329f1fdbee7f5874cb SRODATA dupok size=8
    0x0000 01 00 00 00 00 00 00 00

The LOCK instruction is:

Causes the processor’s LOCK# signal to be asserted during execution of the accompanying instruction (turns the instruction into an atomic instruction). In a multiprocessor environment, the LOCK# signal ensures that the processor has exclusive use of any shared memory while the signal is asserted. In most IA-32 and all Intel 64 processors, locking may occur without the LOCK# signal being asserted. See the “IA-32 Architecture Compatibility” section below for more details. The LOCK prefix can be prepended only to the following instructions and only to those forms of the instructions where the destination operand is a memory operand: ADD, ADC, AND, BTC, BTR, BTS, CMPXCHG, CMPXCH8B, CMPXCHG16B, DEC, INC, NEG, NOT, OR, SBB, SUB, XOR, XADD, and XCHG. If the LOCK prefix is used with one of these instructions and the source operand is a memory operand, an undefined opcode exception (#UD) may be generated. An undefined opcode exception will also be generated if the LOCK prefix is used with any instruction not in the above list. The XCHG instruction always asserts the LOCK# signal regardless of the presence or absence of the LOCK prefix. The LOCK prefix is typically used with the BTS instruction to perform a read-modify-write operation on a memory location in shared memory environment. The integrity of the LOCK prefix is not affected by the alignment of the memory field. Memory locking is observed for arbitrarily misaligned fields. This instruction’s operation is the same in non-64-bit modes and 64-bit mode.

So yes, it has the memory visibility.