This program (from Jonathan Bartlett's Programming From the Ground Up) cycles through all the numbers stored in memory with .long
and puts the largest number in the EBX register for viewing when the program completes.
.section .data
data_items:
.long 3, 67, 34, 222, 45, 75, 54, 34, 44, 33, 22, 11, 66, 0
.section .text
.globl _start
_start:
movl $0, %edi
movl data_items (,%edi,4), %eax
movl %eax, %ebx
start_loop:
cmpl $0, %eax
je loop_exit
incl %edi
movl data_items (,%edi,4), %eax
cmpl %ebx, %eax
jle start_loop
movl %eax, %ebx
jmp start_loop
loop_exit:
movl $1, %eax
int $0x80
I'm not certain about the purpose of (,%edi,4)
in this program. I've read that the commas are for separation, and that the 4 is for reminding our computer that each number in data items is 4 bytes long. Since we've already declared that each number is 4 bytes with .long, why do we need to do it again here? Also, could someone explain in more detail what purpose the two commas serve in this situation?
In AT&T syntax, memory operands have the following syntax1:
displacement(base_register, index_register, scale_factor)
The base, index and displacement components can be used in any combination, and every component can be omitted
but obviously the commas must be retained if you omit the base register, otherwise it would be impossible for the assembler to understand which of those components you are leaving out.
All this data gets combined to calculate the address you are specifying, with the following formula:
effective_address = displacement + base_register + index_register*scale_factor
(which incidentally is almost exactly how you would specify this in Intel syntax).
So, armed with this knowledge we can decode your instruction:
movl data_items (,%edi,4), %eax
Matching the syntax above, you see that:
data_items
is the displacement;base_register
is omitted, so is not put into the formula above;%edi
is index_register
;4
is scale_factor
.So, you are telling the CPU to move a long from the location data_items+%edi*4
to the register %eax
.
The *4
is necessary because each element of your array is 4-bytes wide, so to transform the index (in %edi
) to an offset (in bytes) from the start of the array you have to multiply it by 4.
Since we've already declared that each number is 4 bytes with .long, why do we need to do it again here?
Assemblers are low level tools that knows nothing about types.
.long
is not an array declaration, is just a directive to the assembler to emit the bytes corresponding to the 32-bit representation of its parameters;data_items
is not an array, is just a symbol that gets resolved to some memory location, exactly as the other labels; the fact that you placed a .long
directive after it is of no particular significance to the assembler.Notes