Register access suppresses otherwise possible optimization (avr-gcc)

The following code contains a simple example for e.g. an avr128da32 MCU. One can access a SFR via the old macros like VPORTA_DIR or via a structure mapping like VPORTA.DIR.

#include <avr/io.h>
#include <stdint.h>

static uint16_t g; 

int main() {
    for(uint8_t i = 0; i < 20; i++) {
        ++g;
//        VPORTA_DIR; // <1> suppresses optimization
        VPORTA.DIR; // <2> OK
    }
}

For case <2> the load/store to g are hoisted/sinked outsie the loop as expected.

But for case <1> this does not happen - but the code should be identical.

What does avr-gcc suppress from optimizing in case <1>?

Assembly for <1>:

main:
ldi r24,lo8(20)  ;  ivtmp_4,
.L2:
lds r18,g        ;  g, g
lds r19,g+1      ;  g, g
subi r18,-1      ;  tmp48,
sbci r19,-1      ; ,
sts g,r18        ;  g, tmp48
sts g+1,r19      ;  g, tmp48
in r25,0         ;  vol.1_7, MEM[(volatile uint8_t *)0B]
subi r24,lo8(-(-1))      ;  ivtmp_4,
cpse r24,__zero_reg__    ;  ivtmp_4,
rjmp .L2         ;
ldi r24,0                ;
ldi r25,0                ;
ret

Assembly for <2>:

main:
lds r24,g        ;  g_lsm.6, g
lds r25,g+1      ;  g_lsm.6, g
ldi r18,lo8(20)  ;  ivtmp_2,
.L2:
in r19,0         ;  vol.1_7, MEM[(struct VPORT_t *)0B].DIR
subi r18,lo8(-(-1))      ;  ivtmp_2,
cpse r18,__zero_reg__    ;  ivtmp_2,
rjmp .L2         ;
adiw r24,20      ;  tmp49,
sts g,r24        ;  g, tmp49
sts g+1,r25      ;  g, tmp49
ldi r24,0                ;
ldi r25,0                ;
ret

Compilation is done with avr-gcc 12.2 and -Os.

Solution

It would seem to be a library and/or compiler (port) bug and it seems more serious than just a missed optimization. I can reproduce it in avr-gcc 12.2 (and much older) like this:

gcc -std=c11 -pedantic -Wall -Wextra -Os -mmcu=avr51 -fno-strict-aliasing

#include <stdint.h>

typedef struct VPORT_struct 
{     
  volatile uint8_t DIR;  /* Data Direction */     
  volatile uint8_t OUT;  /* Output Value */     
  volatile uint8_t IN;  /* Input Value */     
  volatile uint8_t INTFLAGS;  /* Interrupt Flags */ 
} VPORT_t;

#define VPORTA (*(VPORT_t*) 0x0000)

static uint16_t g; 

int main() {
    for(uint8_t i = 0; i < 20; i++) {
        ++g;
        VPORTA.DIR;
    }
}

Output (completely bananas):

__zero_reg__ = 1
main:
        ldi r24,lo8(20)
.L2:
        lds r18,g
        lds r19,g+1
        subi r18,-1
        sbci r19,-1
        sts g+1,r19
        sts g,r18
        lds r25,0
        subi r24,lo8(-(-1))
        cpse r24,__zero_reg__
        rjmp .L2
        ldi r24,0
        ldi r25,0
        ret

Here the #define VPORTA (*(VPORT_t*) 0x0000) (as done in the AVR library according to OP) is a dirty cast which seems to trip the compiler. Somehow it seems to think VPORT may be an alias for g which is just nonsense. uint16_t is not a compatible type with VPORT_t. None of the exceptions to the strict aliasing rule applies.

Oddly enough it works while strict aliasing is enabled (which should be the default setting -fstrict-aliasing) but not when I disable it -fno-strict-aliasing. And yet there's no case for strict aliasing here. With strict aliasing the expected code is generated, along the lines of:

.L2:
    lds r19,0
    subi r18,lo8(-(-1))
    cpse r18,__zero_reg__
    rjmp .L2

That 0x0000 happens to to be the null pointer constant doesn't seem to matter, I get the same wrong machine code no matter address used. If I do, I get a nonsense warning however:

warning: array subscript 0 is outside array bounds of 'VPORT_t[0]' {aka 'struct VPORT_struct[]'} [-Warray-bounds] | #define VPORTA ((VPORT_t) 0x0001)

The heck? This all just seems completely broken and I can't explain why. Obviously, don't use dirty casts like this, but if that's what you were given from the AVR library, then that library is bad.

Also the compiler should be able to just remove all access to g from the code since it isn't used in any expression containing side effects.

The obvious work-around is to drop the struct:

#define VPORTA_DIR (*(volatile uint8_t*)0x0000u)