cgccembeddedsectionsobjcopy

.rodata section breaks after moving an object out of the function scope


I'm building a simple payload to execute on an ARM64 system that will print a "Hello, world!" string over UART.

hello-world-payload.c:

#include <stdint.h>

typedef uint32_t u32;

int _start() {  
    const char* txt = "Hello, world!\n";
    volatile u32* uart_wfifo = (volatile u32*)0xc81004c0;
    volatile u32* uart_status = (volatile u32*)0xc81004cc;
    
    u32 i = 0;
    char c = txt[0];
    while (c) {
        // wait for UART availability
        do {} while (! (*uart_status & (1 << 22)) );
        // print 1 character
        *uart_wfifo = (0x000000ff & c);
        c = txt[++i];
    }
    
    while (1) {} // wait for watchdog
}

Makefile:

CROSS_COMPILE ?= aarch64-linux-gnu-
CC      = $(CROSS_COMPILE)gcc
OBJCOPY = $(CROSS_COMPILE)objcopy

AFLAGS  = -nostdlib
CFLAGS  = -O0 -nostdlib
LDFLAGS = -Wl,--build-id=none

all: hello-world-payload.bin

%.elf: %.c
    $(CC) $(CFLAGS) $(LDFLAGS) -o $@ $^

%.bin: %.elf
    $(OBJCOPY) -O binary -S -g --strip-unneeded \
        -j .text \
        -j .rodata \
        $< $@

.PHONY: clean
clean:
    rm hello-world-payload.bin

For cross compiler I use the gcc-arm-10.3-2021.07-x86_64-aarch64-none-elf (AArch64 ELF bare-metal target) toolchain from ARM Developer. With code above I get a 159 bytes binary that works just fine.

Once I move the txt out of the function scope this way:

typedef uint32_t u32;
const char* txt = "Hello, world!\n";
int _start() {

, the payload doesn't run anymore. After loading the payload binary into Ghidra I notice that the code tries to access txt at DAT_000100a0 while in fact it's stored at 0x90.

Ghidra decompiler listing

Since txt is const and is already initialized it should belong to the .rodata section which I confirmed by inspecting the assembly output of ${CROSS_COMPILE}gcc -O0 -nostdlib -Wl,--build-id=none -o hello-world-payload.s hello-world-payload.c -S, here's an excerpt from it:

    .arch armv8-a
    .file   "hello-world-payload.c"
    .text
    .global txt
    .section    .rodata
    .align  3
.LC0:
    .string "Hello, world!\n"
    .data
    .align  3
    .type   txt, %object
    .size   txt, 8

I made sure I didn't forget to include .rodata in Makefile:

%.bin: %.elf
    $(OBJCOPY) -O binary -S -g --strip-unneeded \
        -j .text \
        -j .rodata \
        $< $@

The environment this binary runs in puts some constraints such as the max payload size (approx 29000 bytes in my case) and as far as I understood the binary must begin with the .text section so my goal is to keep the payload size as small as possible but I want to access various objects from different functions.

I inspected the ${CROSS_COMPILE}readelf -S output for hello-world-payload.o (${CROSS_COMPILE}gcc -O0 -nostdlib -Wl,--build-id=none -o hello-world-payload.o hello-world-payload.c):

Section Headers:
  [Nr] Name              Type             Address           Offset       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000     0000000000000000  0000000000000000           0     0     0
  [ 1] .text             PROGBITS         0000000000400000  00010000     0000000000000090  0000000000000000  AX       0     0     4
  [ 2] .rodata           PROGBITS         0000000000400090  00010090     000000000000000f  0000000000000000   A       0     0     8
  [ 3] .data             PROGBITS         00000000004100a0  000100a0     0000000000000008  0000000000000000  WA       0     0     8
  [ 4] .comment          PROGBITS         0000000000000000  000100a8     000000000000005d  0000000000000001  MS       0     0     1
  [ 5] .symtab           SYMTAB           0000000000000000  00010108     00000000000001e0  0000000000000018           6     9     8
  [ 6] .strtab           STRTAB           0000000000000000  000102e8     000000000000006f  0000000000000000           0     0     1
  [ 7] .shstrtab         STRTAB           0000000000000000  00010357     0000000000000038  0000000000000000

I see there's a .data section so I tried to add it to the objcopy command in my Makefile:

%.bin: %.elf
    $(OBJCOPY) -O binary -S -g --strip-unneeded \
        -j .text \
        -j .rodata \
        -j .data \
        $< $@

The binary size grows to whopping 65704 bytes but even with the .data section Ghidra shows the same DAT_000100a0 reference with nothing like the `"Hello, world!\n" string at that position:

DAT_000100a0 offset in broken binary with .data section added

The actual string is at 0x90 as it was before adding the .data section.

It is clear to me that the compiler messes up addresses of .rodata section where the string resides but I don't know how to fix it. Adding .data section didn't help.


Solution

  • Commonly with microcontrollers, the content of the .data section needs to be initialized by the start-up code from a section in non-volatile memory of the same size. Apparently your start-up code does not fulfill this requirement to run a C application.

    In contrast to your belief, txt is an separate non-constant variable, because it is a modifiable pointer to the constant text. Your C code specifies to initialize this global variable with the address of the unnamed string. But no code does this.

    You can make the global pointer variable constant, if you change your code to:

    const char * const txt = "Hello, world!\n";
    

    Now txt is located in .rodata.

    You can avoid the global pointer variable at all, if you change your code to:

    const char txt[] = "Hello, world!\n";
    

    Now txt names the array of characters, which is located in .rodata.


    In your first version of your program, txt was a dynamic variable on the stack. The code initialized it with the address of the unnamed string after entering the function _start().