I am trying to understand how I can compile and link a bare metal app using LLVM (macOS).
loader.s:
.global _reset
_reset:
# Set up stack pointer
LDR X2, =stack_top
MOV SP, X2
# Magic number
MOV X13, #0x1337
# Loop endlessly
BL start
main.c:
void start() {
for(;;) {
}
}
In GNU GCC I would write a linker script but seems LLVM doesn't support it. So, how can I tell llvm-link to join object files in right order and specify an offset for the code?
P.S. I am a noobie in assembler, unfortunately, so it brings me a lot of problems too. In short, I want to execute a minimal assembler code and jump to C function that will do an infinite loop.
The assembler code above was copied and a little changed from here
novectors.s
.globl _reset
_reset:
mov sp,#0x2000
bl notmain
b .
.data
.word 0x12345678
memmap:
ENTRY(_reset)
MEMORY
{
ram : ORIGIN = 0x1000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > ram
.rodata : { *(.rodata*) } > ram
.bss : { *(.bss*) } > ram
.data : { *(.data*) } > ram
}
notmain.c:
unsigned int x;
unsigned int y=5;
void notmain ( void )
{
x=y;
}
build
clang -c -Wall -O2 novectors.s -o novectors.o
clang -c -Wall -O2 -fomit-frame-pointer -fno-exceptions -fno-asynchronous-unwind-tables -fno-unwind-tables notmain.c -o notmain.o
ld.lld -nostdlib -T memmap novectors.o notmain.o -o notmain.elf
llvm-objdump -D notmain.elf > notmain.list
llvm-objcopy notmain.elf -O binary notmain.bin
diss
Disassembly of section .text:
0000000000001000 <_reset>:
1000: b27303ff orr sp, xzr, #0x2000
1004: 94000002 bl 0x100c <notmain>
1008: 14000000 b 0x1008 <_reset+0x8>
000000000000100c <notmain>:
100c: 90000008 adrp x8, 0x1000 <_reset>
1010: 90000009 adrp x9, 0x1000 <_reset>
1014: b9402908 ldr w8, [x8, #0x28]
1018: b9002128 str w8, [x9, #0x20]
101c: d65f03c0 ret
Disassembly of section .bss:
0000000000001020 <x>:
...
Disassembly of section .data:
0000000000001024 <$d.1>:
1024: 78 56 34 12 .word 0x12345678
0000000000001028 <y>:
1028: 05 00 00 00 .word 0x00000005
hexdump -C notmain.bin
00000000 ff 03 73 b2 02 00 00 94 00 00 00 14 08 00 00 90 |..s.............|
00000010 09 00 00 90 08 29 40 b9 28 21 00 b9 c0 03 5f d6 |.....)@.(!...._.|
00000020 00 00 00 00 78 56 34 12 05 00 00 00 |....xV4.....|
0000002c
You can complicate it from there. Qemu is happy with an elf file so you do not have to convert to binary. You do need to figure out what the memory space is for the machine you are using (cortex-a72 is the core not the machine) (it's the alphabet/language not the book)
There are ways to figure this out using the binary format. So far all of the qemu machine uarts I have used do not require polling/waiting/init so you can just jam characters into the tx buffer. So you can make a small position independent program (no stack, etc) and can you get the pc from aarch64?, you get the pc and print it on the uart (in hex or octal of course, never use printf in bare metal).
I threw in all those command line options because I was getting an eh_frame so a little Google, and that worked, did not spend any more time on it to find out which one mattered.
Now you are on macOS and I do not know what your clang/llvm toolchain looks like, for example there are pre-builts that can build for other targets to some extent but my experience is you get one linker for the host, not for targets, so I build my clang/llvm from sources specifically for my cross target (gnu style), a separate toolchain for each target. Fought llvm for years with the generic one and using gnu binutils for assembling and linking, had to start over on the make files each major version. Quite painful. This is much much less painful.
Where do you find machine info for qemu? Sadly the best way is to get the source code and dig through it, only reliable way I have found. If you find some not overly complicated examples for the same core and machine, then you can maybe steal at least a working memory space, and then look at the qemu sources for the uart tx register.
qemu is not a real machine, the more time you spend writing code for qemu the less likely it will work on real hardware even if the machine and core are supposed to match. It's a nice or at least tolerable way to learn, you won't brick any hardware, but some day you got to try hardware and deal with it. You can get a 64 bit arm based Raspberry Pi for a few bucks and the baremetal forum at their site is very good, will get you similarly simple bare metal examples that can get you booted (and while technically you can brick a Pi if I understand right, you have to actually work at it, in general you pop the sd card out, and try again).
llvm claims gnu compatibility with clang being an assembler (ewww), clang and gcc, and their linker and such. Close but not 100%, beware that statements like that are generally going to be false and again, write too much code without checking, the bigger the failure.
You need to get the arm documentation for the core, cortex-whatever, (it is called a Technical Reference Manual (TRM), directly from arm not from someone else). Then in the TRM it will tell you the architecture (armv-8a or something), and you will need the Architectural Reference Manual (ARM ARM) for that architecture. You do NOT need the programmers reference that creates more confusion than it solves, also do not look at some generic arm 64 bit instruction reference on their site, again more problems than solutions, you need the ARM TRM and ARM ARM for your core. Chip and other stuff is not arm so you have to find that from the chip vendor, or in the case of qemu from the qemu sources.
In the ARM ARM you will find how the aarch64 cores boot, how their vectors work which is not how the 32 bit ones work and not how the cortex-ms work, they are three different solutions. (and then adjust your bootstrap accordingly to place the handlers as the logic/hardware defined addresses).
I find the gnu tools easier and you will find considerably more examples out there for gnu. Clang/llvm is not a drop in replacement. Gnu is easier to come by IMO and/or easier to build (cmake sucks, if nothing else it takes almost an order of magnitude longer to build).
The above could be simpler, but this is not a bad starting point, then you can complicate it from there if you feel the need. Most folks grossly overcomplicate everything, find your ideal base/skeleton.
Okay the thing you linked (bad to put external links on this site, they are not guaranteed to remain up) covers all of this topic, not sure why you are asking again here?
bootstrap
.globl _reset
_reset:
mov x0,#0x09000000
mov w1,#0x55
loop:
strb w1,[x0]
b loop
linker script
ENTRY(_reset)
MEMORY
{
ram : ORIGIN = 0x40000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > ram
.rodata : { *(.rodata*) } > ram
}
build
clang -c -Wall -O2 novectors.s -o novectors.o
ld.lld -nostdlib -T memmap novectors.o -o notmain.elf
llvm-objdump -D notmain.elf > notmain.list
check it first always.
Disassembly of section .text:
0000000040000000 <_reset>:
40000000: d2a12000 mov x0, #0x9000000
40000004: 52800aa1 mov w1, #0x55
0000000040000008 <loop>:
40000008: 39000001 strb w1, [x0]
4000000c: 17ffffff b 0x40000008 <loop>
run it
qemu-system-aarch64 -M virt -cpu cortex-a72 -m 128M -nographic -kernel notmain.elf
and it spams the screen with the letter U (0x55).
ctrl-a then x (at least on linux) to kill it.
With C.
bootstrap
.globl _reset
_reset:
mov x10,#0x48000000
mov sp,x10
bl notmain
b .
.globl write32
write32:
str w1,[x0]
ret
c code
extern void write32 ( unsigned int, unsigned int );
void notmain ( void )
{
unsigned int ra;
for(ra=0x30;;ra++)
{
ra&=0x37;
write32(0x09000000,ra);
}
}
linker script
ENTRY(_reset)
MEMORY
{
ram : ORIGIN = 0x40000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > ram
.rodata : { *(.rodata*) } > ram
}
build
clang -c -Wall -O2 novectors.s -o novectors.o
clang -c -Wall -O2 -fomit-frame-pointer -fno-exceptions -fno-asynchronous-unwind-tables -fno-unwind-tables notmain.c -o notmain.o
ld.lld -nostdlib -T memmap novectors.o notmain.o -o notmain.elf
llvm-objdump -D notmain.elf > notmain.list
run the same way as above and it spews 0123456701234567 until you stop it.
Let's see where a -O binary would be placed...
linker script
ENTRY(_reset)
MEMORY
{
ram : ORIGIN = 0x00000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > ram
.rodata : { *(.rodata*) } > ram
}
bootstrap
.globl _reset
_reset:
mov x10,#0x48000000
mov sp,x10
bl notmain
b .
.globl write32
write32:
str w1,[x0]
ret
.globl getpc
getpc:
mov x1,lr
bl here
here:
mov x0,lr
mov lr,x1
ret
c code
extern unsigned int getpc ( void );
extern void write32 ( unsigned int, unsigned int );
void notmain ( void )
{
unsigned int ra;
unsigned int rb;
unsigned int rc;
ra=getpc();
rb=32;
while(1)
{
rb-=4;
rc=(ra>>rb)&0xF;
if(rc>9) rc+=0x37; else rc+=0x30;
write32(0x09000000,rc);
if(rb==0) break;
}
write32(0x09000000,0x0D);
write32(0x09000000,0x0A);
}
build
clang -c -Wall -O2 -fomit-frame-pointer -fno-exceptions -fno-asynchronous-unwind-tables -fno-unwind-tables notmain.c -o notmain.o
ld.lld -nostdlib -T memmap novectors.o notmain.o -o notmain.elf
llvm-objdump -D notmain.elf > notmain.list
llvm-objcopy notmain.elf -O binary notmain.bin
run
qemu-system-aarch64 -M virt -cpu cortex-a72 -m 128M -nographic -kernel notmain.bin
40080020
So if we switch to
ENTRY(_reset)
MEMORY
{
ram : ORIGIN = 0x40080000, LENGTH = 0x1000
nada: ORIGIN = 0xFFFFFFF0, LENGTH = 0
}
SECTIONS
{
.text : { *(.text*) } > ram
.rodata : { *(.rodata*) } > ram
.bss : { *(.bss*) } > nada
.data : { *(.data*) } > nada
}
You can build and run the elf or the bin file and should be just fine (with the .bin you can trivially have .bss and .data, for the elf you have to do some ugly linker script stuff and then some bootstrap.)
If you do not let yourself use any .data and do not expect .bss to be zeros then your bootstrap and linker script can be trivial.
ENTRY(_reset)
MEMORY
{
ram : ORIGIN = 0x40080000, LENGTH = 0x1000
nada: ORIGIN = 0xFFFFFFF0, LENGTH = 0
}
SECTIONS
{
.text : { *(.text*) } > ram
.rodata : { *(.rodata*) } > ram
.bss : { *(.bss*) } > ram
.data : { *(.data*) } > nada
}
and
mov x10,#0x48000000
mov sp,x10
bl notmain
b .
if you use the .bin file format then you can use the example at the top and get zeroed .bss and initialized .data as trivial.
ENTRY(_reset)
MEMORY
{
ram : ORIGIN = 0x40080000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > ram
.rodata : { *(.rodata*) } > ram
.bss : { *(.bss*) } > ram
.data : { *(.data*) } > ram
}
and
.globl _reset
_reset:
mov x10,#0x48000000
mov sp,x10
bl notmain
b .
...
.data
.word 0x12345678
but not the elf format.