assemblygccarmthumbarm-none-eabi-gcc

Why does GCC produce extra ADDS instruction after LDR for loading an .rodata pointer on ARM thumb instruction set?


This code:

const char padding[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, };

const char myTable[] = { 1, 2, 3, 4 };

int keepPadding() {
  return (int)(&padding);
}

int foo() {
  return (int)(&myTable);  // <-- this is the part I'm looking at
}

compiles to the following assembly for the thumb instruction set (abbreviated for clarity). Note particularly the adds as the second instruction of foo:

...
foo:
    @ args = 0, pretend = 0, frame = 0
    @ frame_needed = 0, uses_anonymous_args = 0
    @ link register save eliminated.
    ldr r0, .L5
    @ sp needed
    adds    r0, r0, #10
    bx  lr
.L6:
    .align  2
.L5:
    .word   .LANCHOR0
    .size   foo, .-foo
    .align  1
    .global bar
    .syntax unified
    .code   16
    .thumb_func
    .type   bar, %function

...
myTable:
    .ascii  "\001\002\003\004"

It looks like it's loading a pointer (ldr) to the top of .rodata and then programmatically offsetting to the location of myTable (adds). But why not just load the address of the table itself directly?

Note: when I remove the const then it seems to do it without the ADDS instruction (with myTable in .data)

The context of the question is that I'm trying to hand-optimize some C firmware and noticed this adds instruction that seems to be superfluous, so I'm wondering if there's a way to restructure my code to get rid of it.

Note: this is all compiled for the ARM thumb instruction set as follows (using arm-none-eabi-gcc version 11.2.1):

arm-none-eabi-gcc -Os -c -mcpu=cortex-m0 -mthumb temp.c -S

Also note: the example code here is intended to represent a snippet of a larger codebase. If myTable were the only thing compiled then it lands at offset 0 in .rodata and the adds instruction disappears, but that is not the typcial case a real-world scenario. To represent the typical real-world scenario that produces this assembly, I added padding before the table.

See also here it's reproduced on Godbolt


Solution

  • The question originally contained just this:

    const char myTable[] = { 1, 2, 3, 4 };
    int foo() {
      return (int)(&myTable);
    }
    
    
    arm-none-eabi-gcc -Os -c -mthumb so.c -o so.o
    arm-none-eabi-objdump -D so.o
    

    but it did not produce the adds:

    Disassembly of section .text:
    
    00000000 <foo>:
       0:   4800        ldr r0, [pc, #0]    ; (4 <foo+0x4>)
       2:   4770        bx  lr
       4:   00000000    andeq   r0, r0, r0
    
    Disassembly of section .rodata:
    
    00000000 <myTable>:
       0:   04030201    streq   r0, [r3], #-513 ; 0xfffffdff
    

    The question has been edited to show a repeatable example, and this answer has been edited as a result. But I will just leave the answer to work toward the same solution. As maybe it is of interest that to get to the anchor took a few components to avoid the problem being optimized out.

    So from your question and this:

    const char padding[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, };
    const char myTable[] = { 1, 2, 3, 4 };
    int foo() {
      return (int)(&myTable);
    }
    

    It is obvious why myTable is at an offset of 10.

    But padding is optimized out so you still end up with the same result.

    So:

    const char padding[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, };
    const char myTable[] = { 1, 2, 3, 4 };
    int keepPadding() {
      return (int)(&padding);
    }
    int foo() {
      return (int)(&myTable);
    }
    

    The name of that function implies you know all of this already and know what it took to make a minimum example, etc.

    arm-none-eabi-gcc -Os -c -mthumb so.c -S
    
    
    foo:
        @ Function supports interworking.
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        ldr r0, .L5
        @ sp needed
        adds    r0, r0, #10
        bx  lr
    .L6:
        .align  2
    .L5:
        .word   .LANCHOR0
        .size   foo, .-foo
        .global myTable
        .global padding
        .section    .rodata
        .set    .LANCHOR0,. + 0
        .type   padding, %object
        .size   padding, 10
    padding:
        .space  10
        .type   myTable, %object
        .size   myTable, 4
    myTable:
        .ascii  "\001\002\003\004"
        .ident  "GCC: (GNU) 11.2.0"
    

    It is generating an anchor then referencing from the anchor rather than directly to the label.

    I suspect it is to allow for an optimization of the ldr. Let's try:

     arm-none-eabi-gcc -Os -c -mthumb -mcpu=cortex-m4 so.c -S
    
    foo:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        ldr r0, .L5
        bx  lr
    .L6:
        .align  2
    .L5:
        .word   .LANCHOR0+10
        .size   foo, .-foo
    
    00000008 <foo>:
       8:   4800        ldr r0, [pc, #0]    ; (c <foo+0x4>)
       a:   4770        bx  lr
       c:   0000000a    .word   0x0000000a
    

    yeah, so that fixed it, but what about linking it

    Disassembly of section .rodata:
    
    00000000 <padding>:
        ...
    
    0000000a <myTable>:
       a:   04030201    streq   r0, [r3], #-513 ; 0xfffffdff
    
    Disassembly of section .text:
    
    00000010 <keepPadding>:
      10:   4800        ldr r0, [pc, #0]    ; (14 <keepPadding+0x4>)
      12:   4770        bx  lr
      14:   00000000    andeq   r0, r0, r0
    
    00000018 <foo>:
      18:   4801        ldr r0, [pc, #4]    ; (20 <foo+0x8>)
      1a:   300a        adds    r0, #10
      1c:   4770        bx  lr
      1e:   46c0        nop         ; (mov r8, r8)
      20:   00000000    andeq   r0, r0, r0
    

    Nope, was hoping that the linker would replace the pc-relative load and turn that into a mov r0,#0...Saving the load which is (might be) an optimization for systems that are not cortex-m (or even cortex-m).

    Note: this also works

    arm-none-eabi-gcc -Os -c -mthumb -fno-section-anchors so.c -o so.o
    
    00000008 <foo>:
       8:   4800        ldr r0, [pc, #0]    ; (c <foo+0x4>)
       a:   4770        bx  lr
       c:   00000000    andeq   r0, r0, r0
    foo:
        @ Function supports interworking.
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        ldr r0, .L5
        @ sp needed
        bx  lr
    .L6:
        .align  2
    .L5:
        .word   myTable
        .size   foo, .-foo
        .global myTable
        .section    .rodata
        .type   myTable, %object
        .size   myTable, 4
    myTable:
        .ascii  "\001\002\003\004"
        .global padding
        .type   padding, %object
        .size   padding, 10
    

    The anchor was not used so the address of myTable was used directly.

    From my perspective the "why" is because an anchor was used and the padding in front caused myTable to be an offset from the anchor. So the load loads the anchor address then adds gets you from the anchor to the table.

    Why the anchor? Exercise for the reader, or someone else.