I'm building a simple payload to execute on an ARM64 system that will print a "Hello, world!" string over UART.
hello-world-payload.c:
#include <stdint.h>
typedef uint32_t u32;
int _start() {
const char* txt = "Hello, world!\n";
volatile u32* uart_wfifo = (volatile u32*)0xc81004c0;
volatile u32* uart_status = (volatile u32*)0xc81004cc;
u32 i = 0;
char c = txt[0];
while (c) {
// wait for UART availability
do {} while (! (*uart_status & (1 << 22)) );
// print 1 character
*uart_wfifo = (0x000000ff & c);
c = txt[++i];
}
while (1) {} // wait for watchdog
}
Makefile:
CROSS_COMPILE ?= aarch64-linux-gnu-
CC = $(CROSS_COMPILE)gcc
OBJCOPY = $(CROSS_COMPILE)objcopy
AFLAGS = -nostdlib
CFLAGS = -O0 -nostdlib
LDFLAGS = -Wl,--build-id=none
all: hello-world-payload.bin
%.elf: %.c
$(CC) $(CFLAGS) $(LDFLAGS) -o $@ $^
%.bin: %.elf
$(OBJCOPY) -O binary -S -g --strip-unneeded \
-j .text \
-j .rodata \
$< $@
.PHONY: clean
clean:
rm hello-world-payload.bin
For cross compiler I use the gcc-arm-10.3-2021.07-x86_64-aarch64-none-elf (AArch64 ELF bare-metal target) toolchain from ARM Developer. With code above I get a 159 bytes binary that works just fine.
Once I move the txt
out of the function scope this way:
typedef uint32_t u32;
const char* txt = "Hello, world!\n";
int _start() {
, the payload doesn't run anymore. After loading the payload binary into Ghidra I notice that the code tries to access txt
at DAT_000100a0
while in fact it's stored at 0x90.
Since txt
is const
and is already initialized it should belong to the .rodata
section which I confirmed by inspecting the assembly output of ${CROSS_COMPILE}gcc -O0 -nostdlib -Wl,--build-id=none -o hello-world-payload.s hello-world-payload.c -S
, here's an excerpt from it:
.arch armv8-a
.file "hello-world-payload.c"
.text
.global txt
.section .rodata
.align 3
.LC0:
.string "Hello, world!\n"
.data
.align 3
.type txt, %object
.size txt, 8
I made sure I didn't forget to include .rodata
in Makefile:
%.bin: %.elf
$(OBJCOPY) -O binary -S -g --strip-unneeded \
-j .text \
-j .rodata \
$< $@
The environment this binary runs in puts some constraints such as the max payload size (approx 29000 bytes in my case) and as far as I understood the binary must begin with the .text
section so my goal is to keep the payload size as small as possible but I want to access various objects from different functions.
I inspected the ${CROSS_COMPILE}readelf -S
output for hello-world-payload.o (${CROSS_COMPILE}gcc -O0 -nostdlib -Wl,--build-id=none -o hello-world-payload.o hello-world-payload.c
):
Section Headers:
[Nr] Name Type Address Offset Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000 0000000000000000 0000000000000000 0 0 0
[ 1] .text PROGBITS 0000000000400000 00010000 0000000000000090 0000000000000000 AX 0 0 4
[ 2] .rodata PROGBITS 0000000000400090 00010090 000000000000000f 0000000000000000 A 0 0 8
[ 3] .data PROGBITS 00000000004100a0 000100a0 0000000000000008 0000000000000000 WA 0 0 8
[ 4] .comment PROGBITS 0000000000000000 000100a8 000000000000005d 0000000000000001 MS 0 0 1
[ 5] .symtab SYMTAB 0000000000000000 00010108 00000000000001e0 0000000000000018 6 9 8
[ 6] .strtab STRTAB 0000000000000000 000102e8 000000000000006f 0000000000000000 0 0 1
[ 7] .shstrtab STRTAB 0000000000000000 00010357 0000000000000038 0000000000000000
I see there's a .data
section so I tried to add it to the objcopy
command in my Makefile:
%.bin: %.elf
$(OBJCOPY) -O binary -S -g --strip-unneeded \
-j .text \
-j .rodata \
-j .data \
$< $@
The binary size grows to whopping 65704 bytes but even with the .data
section Ghidra shows the same DAT_000100a0
reference with nothing like the `"Hello, world!\n" string at that position:
The actual string is at 0x90 as it was before adding the .data
section.
It is clear to me that the compiler messes up addresses of .rodata
section where the string resides but I don't know how to fix it. Adding .data
section didn't help.
Commonly with microcontrollers, the content of the .data
section needs to be initialized by the start-up code from a section in non-volatile memory of the same size. Apparently your start-up code does not fulfill this requirement to run a C application.
In contrast to your belief, txt
is an separate non-constant variable, because it is a modifiable pointer to the constant text. Your C code specifies to initialize this global variable with the address of the unnamed string. But no code does this.
You can make the global pointer variable constant, if you change your code to:
const char * const txt = "Hello, world!\n";
Now txt
is located in .rodata
.
You can avoid the global pointer variable at all, if you change your code to:
const char txt[] = "Hello, world!\n";
Now txt
names the array of characters, which is located in .rodata
.
In your first version of your program, txt
was a dynamic variable on the stack. The code initialized it with the address of the unnamed string after entering the function _start()
.