We are using a zynq-7000 based CPU, so an cortex-a9 and we encountered the following issue while using atomic_flags which are inside an library we are using (open-amp).
We are using the second CPU on the SoC to execute bare-metal code.
When disabling the dcache, atomic ints are no longer able to be set, here's a simple code which triggers the issue for us:
#define XREG_CONTROL_DCACHE_BIT (0X00000001U<<2U)
#define XREG_CP15_SYS_CONTROL "p15, 0, %0, c1, c0, 0"
#define mfcp(rn) ({uint32_t rval = 0U; \
__asm__ __volatile__(\
"mrc " rn "\n"\
: "=r" (rval)\
);\
rval;\
})
#define mtcp(rn, v) __asm__ __volatile__(\
"mcr " rn "\n"\
: : "r" (v)\
);
static void DCacheDisable(void) {
uint32_t CtrlReg;
/* clean and invalidate the Data cache */
CtrlReg = mfcp(XREG_CP15_SYS_CONTROL);
CtrlReg &= ~(XREG_CONTROL_DCACHE_BIT);
/* disable the Data cache */
mtcp(XREG_CP15_SYS_CONTROL, CtrlReg);
}
int main(void) {
DCacheDisable();
atomic_int flag = 0;
printf("Before\n");
atomic_flag_test_and_set(&flag);
printf("After\n");
}
The CPU executes the following loop for atomic_flag_test_and_set
:
dmb ish
ldrexb r1, [r3] ; bne jumps here
strexb r0, r2, [r3]
cmp r0, #0
bne -20 ; addr=0x1f011614: main + 0x00000060
dmb ish
but the register r0
always stays 1
.
When omitting the function call to DCacheDisable
, the code works flawlessly.
I really can't find any any information about disabled dcache and atomic flags.
Does anybody has a clue?
Toolchain: We are using vitis 2022.2 which comes with a arm-xilinx-eabi-gcc.exe (GCC) 11.2.0
. Compiler options are -O2 -std=c11 -mcpu=cortex-a9 -mfpu=vfpv3 -mfloat-abi=hard
This is common on ARM platforms that support a cache. The cache line is being used as a temporary store for the exclusive lock. The term in ARM is exclusive reserve granule or the size of locked memory. On systems with a cache, you will find the granule is a cache line size.
So internally, the ldrex
and strex
are implemented as part of the cache resolution policy. You can compare it to cortex-m systems, where the entire memory space is a reserve granule.
The ldrex
/strex
pair are useless for synchronizing with external devices that are not part of an AXI structure. If you want to disable cache to work with an FPGA interface, I don't believe this can work. You would need to implement the cache protocol in the FPGA.
For Cortex-M systems, there is no cache structure and custom logic implements a 'global monitor'.
The cache mechanism actually seems useful as the cache line could be used as a transactional memory. Ie, either the whole line commits on not. It seems possible to create lock-free algorithm for structures with multiple pointers. The node do not lock an entire list but only an entry. However, I haven't seen it used like this ever. Mainly I think because the ARM documentation recommends not to do this (do not rely on the ERG size).