cmultithreadinggcccompiler-optimizationbarrier

Compiler barrier in single-threaded code necessary?


I have a piece of code with doubly linked lists (inspired by the linux lists) that works without compiler optimizations. When I activate compiler optimization, it fails.

My program adds an element to the list, then it loops over the list while it is not empty. In the loop body the first element is fetched, then an assert checks that the element is in the list, then the element is removed. This assert fails with compiler optimizations (-O2). With a compiler barrier, there is no error.

I assume that in the error case the compiler does not reload the elements of the list in every while (!list_empty(&li)) loop iteration. When I look at the compiled code of the main function, I see that there is no conditional code and __assert_fail is called at the end. But why? I always thought that in single-threaded code it is safe to program without barriers.

This is my Code:

#include <stdlib.h>
#include <stddef.h>
#include <stdio.h>
#include <assert.h>

struct list_elem {
    struct list_elem *next;
    struct list_elem *prev;
};

struct list {
    struct list_elem *next;
    struct list_elem *prev;
};

static inline void init_list_head(struct list *list)
{
    list->next = (struct list_elem *)list;
    list->prev = (struct list_elem *)list;
}

static inline void init_list_elem(struct list_elem *elem)
{
    elem->next = NULL;
    elem->prev = NULL;
}

static inline void *list_entry(struct list_elem *elem, size_t offset)
{
    return (char *)(elem) - offset;
}

static inline int list_empty(struct list *list)
{
    return list->next == (struct list_elem *)list;
}

static inline struct list_elem *first_list_elem(struct list *list)
{
    return list->next;
}

static inline int elem_not_in_list(struct list_elem *elem)
{
    return elem->next == NULL;
}

static inline void __list_add(struct list_elem *elem, struct list_elem *prev, struct list_elem *next)
{
    next->prev = elem;
    elem->next = next;
    elem->prev = prev;
    prev->next = elem;
}

static inline void list_add(struct list_elem *elem, struct list *list)
{
    __list_add(elem, (struct list_elem *)list, list->next);
}

static inline void list_del(struct list_elem *elem)
{
    elem->prev->next = elem->next;
    elem->next->prev = elem->prev;
    elem->next = NULL;
    elem->prev = NULL;
}

struct record {
    int value;
    struct list_elem link;
};

struct record r1;
struct list li;

int main(int argc, char *argv[])
{
    init_list_head(&li);
    init_list_elem(&r1.link);

    r1.value = 1;
    list_add(&r1.link, &li);

    while (!list_empty(&li)) {
        struct record *r = list_entry(first_list_elem(&li), offsetof(struct record, link));
        assert(!elem_not_in_list(&r->link));
        list_del(&r->link);
        r1.value += argc;
        // compiler barrier
        //asm volatile("": : :"memory");
    }

    return r1.value;
}

Compiler:

gcc version 11.2.0 (GCC) 
Target: arm-linux-musleabihf

Compile command:

gcc -O2 -o testbarrier testbarrier.c

Compiled code:

00010290 <main>:
   10290:   e59f1030    ldr r1, [pc, #48]   ; 102c8 <main+0x38>
   10294:   e3a0c000    mov ip, #0
   10298:   e2800001    add r0, r0, #1
   1029c:   e92d4010    push    {r4, lr}
   102a0:   e3a02057    mov r2, #87 ; 0x57
   102a4:   e5810008    str r0, [r1, #8]
   102a8:   e5811000    str r1, [r1]
   102ac:   e5811004    str r1, [r1, #4]
   102b0:   e581c00c    str ip, [r1, #12]
   102b4:   e581c010    str ip, [r1, #16]
   102b8:   e59f300c    ldr r3, [pc, #12]   ; 102cc <main+0x3c>
   102bc:   e59f100c    ldr r1, [pc, #12]   ; 102d0 <main+0x40>
   102c0:   e59f000c    ldr r0, [pc, #12]   ; 102d4 <main+0x44>
   102c4:   ebffffe8    bl  1026c <__assert_fail@plt>
   102c8:   0002103c    .word   0x0002103c
   102cc:   000104dc    .word   0x000104dc
   102d0:   000104b0    .word   0x000104b0
   102d4:   000104c0    .word   0x000104c0

Solution

  • The code has undefined behavior: you cast list as a (struct list_elem *) and access its fields via a different type. This is a violation of the strict aliasing rule. The compiler may generate code that appears to work as expected with no optimisations, but when you enable optimisations with -O2, the compiler makes a deeper analysis of the code and assumes you do not violate this rule. As a consequence, the code generated may not work as expected anymore.

    To avoid this undefined behavior, either