I am just starting to learn ARM assembly on my mac silicon M2. I wrote a program which just takes its command line arguments (aka argv) and prints them (and returns their number, argc) using the write system call.
The program works: It outputs the full path to the binary, using the exact path I called it.
But when I use lldb to examine the locations in memory from which I am convinced argv[0] is taken, it always contains the absolute path.
Is this because lldb always runs it using the absolute path? Is there a way to find out? If yes, is that what lddb should do or is it a bug?
Here is the source code for my program.
1 // ARM assembly program on M2 for mac OS 14.7.1
2 // print argv separated by newlines, return argc
3 .global _start
4 .p2align 2
5 // input from OS: W0 ... argc
6 // X1 ... **char argv
7 // argv[0] points to NULL separated concatenation
8 // of elements of argv (for some reason)
9 //
10 // WORKING MEM: W19 argc
11 // X1 previous *argv for print
12 // X2 current str length
13 // W21 argc loop decr counter
14 // X22 *chr argv loop incr counter
15 // X23 *chr newline
16
17 _start:
18 mov W19, W0 // W0 holds the number of args, copy
19 adr X23, chr_newline // make *"\n" available for printing
20 // set up loop to print all arguments
21 mov W21, W19 // put argc into loop counter
22 ldr X22, [X1] // X22 := *char argv[0]
23 loop_argv:
24 bl handle_arg // print one argument
25 sub W21, W21, #1 // decr loop counter
26 cmp W21, #0 // loop if > 0
27 b.gt loop_argv
28 // exit
29 mov W0, W19 // return code := argc
30 mov X16, #1 // service code for termination
31 svc #0x80 // make sys call
32 // local function handle_arg
33 handle_arg:
34 mov X1, X22 // save start *char in X1
35 mov X2, #0 // X2 should contain len at end
36 count_chars_loop: // search for NULL char separating args
37 ldrb W0, [X22], #1 // W0 = &X22, incr *char X22 after
38 cmp W0, #0 // check if prev X22 pointed to NULL char
39 add X2, X2, #1 // incr len
40 b.gt count_chars_loop
41 sub X2, X2, #1 // correct for overcounting
42 // X22 = *next argv now
43 //print argv[i]
44 mov X0, #1 // to stdout
45 // *char next argv is already in X1
46 // len(argv[i]) is already in X2
47 mov X16, #4 // nr for write call
48 svc #0x80 // make sys call
49 // print newline
50 mov X0, #1 // to stout
51 mov X1, X23 // X1 = *char newline
52 mov X2, #1 // len("\n")
53 mov X16, #4 // nr for write call
54 svc #0x80 // make sys call
55 ret
56 .align 2
57 chr_newline: .ascii "\n"
I compile and link it using
as get_args.s -o get_args.o
ld -o bin/get_args_min get_args_min.o -lSystem -syslibroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -e _start -arch arm64
Here is what I see on the command line:
me@c get_args % ./bin/get_args_min test test
./bin/get_args_min
test
test
me@c get_args %
Note the relative path. (I tried calling it with the absolute path, too, and then I do get it on the terminal.) But the location we print from seems to always contain the full absolute path to the binary. To check this, I used
lldb -- ./bin/get_args_min test test
...then the lldb commands
b handle_args
r
re r
...then copied the address in X22, then
memory read [PASTE]
This is likely caused by the way lldb
launches your program, namely using its absolute path, whereas the shell uses the path you specified (relative or absolute).
When you start lldb
, it shows the executable it will launch. Even if you don't add a directory prefix to the executable path, it sets the executable to its absolute path:
$ lldb -- get_args_min foo bar
(lldb) target create "get_args_min"
Current executable set to '/tmp/get_args_min' (arm64).
The way I understand it, from man execve
, is that the value of argv[0]
isn't standardised, it's up to the calling program (so the shell, or lldb
) to set it.