There's this piece of Keil assembly code in Valvano's book (3.3.3 Memory Access Instruction):
; Keil Syntax
LDR R5, PAaddr
MOV R6, #0x55
STR R6, [R5]
;outside of execution
PAaddr DCD 0x400043FC
The first line LDR R5, PAaddr
gets translated by the assembler to
LDR R5, [PC, #16]
where the #16
represents the number of bytes between the MOV R6, #0x55
and the DCD
definition.
I can't understand how the #16
came about. According to Keil's ARM and Thumb Instructions, MOV
is a 16-bit instruction (hence 2-bytes). I can't find the instruction size for STR
or DCD
, but from reading ARM's instruction set summary, STR
takes twice as many cycles as MOV
's so I would intuitively guess STR
's instruction size is double of what MOV
is (or 4-bytes). DCD
just stores the value to the ROM, so it can't be any bigger than MOV. If I sum up the instruction size in bytes (2 for MOV, 4 for STR, and perhaps a 1 or 2 for DCD), I should get 7 or 8 bytes between the second to the last instruction, or a #7 or #8 jump from PC instead.
I don't have Kiel handy but doesn't really matter, you didn't provide enough information (what is your target architecture/core) and not all of this is well documented by arm.
So generic thumb
.thumb
LDR R5, PAaddr
MOV R6, #0x55
STR R6, [R5]
.align
PAaddr: .word 0x400043FC
Disassembly of section .text:
00000000 <PAaddr-0x8>:
0: 4d01 ldr r5, [pc, #4] ; (8 <PAaddr>)
2: 2655 movs r6, #85 ; 0x55
4: 602e str r6, [r5, #0]
6: 46c0 nop ; (mov r8, r8)
00000008 <PAaddr>:
8: 400043fc .word 0x400043fc
The immediate offset added to the Align(PC, 4) value of the instruction to form the address. Permitted values are multiples of four in the range 0-1020 for encoding T1.
So ALIGN(0x00+2,4) = 0x04. 0x08 - 4 = 4 = one word. So 1 word 0x4D01 the 01 is the immediate.
.thumb
nop
LDR R5, PAaddr
MOV R6, #0x55
STR R6, [R5]
.align
PAaddr: .word 0x400043FC
00000000 <PAaddr-0x8>:
0: 46c0 nop ; (mov r8, r8)
2: 4d01 ldr r5, [pc, #4] ; (8 <PAaddr>)
4: 2655 movs r6, #85 ; 0x55
6: 602e str r6, [r5, #0]
00000008 <PAaddr>:
8: 400043fc .word 0x400043fc
ALIGN(0x02+2,4) = 0x4. 0x08 - 0x04 = 0x04, one word 0x4D01 encoding.
.cpu cortex-m3
.thumb
LDR R5, PAaddr
MOV R6, #0x55
STR R6, [R5]
.align
PAaddr: .word 0x400043FC
Disassembly of section .text:
00000000 <PAaddr-0x8>:
0: 4d01 ldr r5, [pc, #4] ; (8 <PAaddr>)
2: 2655 movs r6, #85 ; 0x55
4: 602e str r6, [r5, #0]
6: bf00 nop
00000008 <PAaddr>:
8: 400043fc .word 0x400043fc
No change, but
.cpu cortex-m3
.syntax unified
.thumb
LDR R5, PAaddr
MOV R6, #0x55
STR R6, [R5]
.align
PAaddr: .word 0x400043FC
Disassembly of section .text:
00000000 <PAaddr-0x8>:
0: 4d01 ldr r5, [pc, #4] ; (8 <PAaddr>)
2: f04f 0655 mov.w r6, #85 ; 0x55
6: 602e str r6, [r5, #0]
00000008 <PAaddr>:
8: 400043fc .word 0x400043fc
and
.cpu cortex-m3
.syntax unified
.thumb
nop
LDR R5, PAaddr
MOV R6, #0x55
STR R6, [R5]
.align
PAaddr: .word 0x400043FC
Disassembly of section .text:
00000000 <PAaddr-0xc>:
0: bf00 nop
2: 4d02 ldr r5, [pc, #8] ; (c <PAaddr>)
4: f04f 0655 mov.w r6, #85 ; 0x55
8: 602e str r6, [r5, #0]
a: bf00 nop
0000000c <PAaddr>:
c: 400043fc .word 0x400043fc
ALIGN(0x02+2,4) = 0x04. 0x0C-0x04 = 0x08, 2 words, 0x4D02 encoding.
You can do the same things with Kiel's assembly language vs gnu shown above.
It's not your job to count unless you are writing your own assembler (or trying to create your own machine code for some other reason).
In any case simply read the ARM architecture documentation for the architecture in question. Compare that to the output of a debugged assembler for further clarification as needed.
From the early/original ARM ARM
address = (PC[31:2] << 2) + (immed_8 * 4)
Rd = Memory[address, 4]
this one makes more sense IMO.
When in doubt go back to the old/original-ish ARM ARM.
Most(ish) recent ARM ARM
if ConditionPassed() then
EncodingSpecificOperations(); NullCheckIfThumbEE(15);
base = Align(PC,4);
address = if add then (base + imm32) else (base - imm32);
data = MemU[address,4];
if t == 15 then
if address<1:0> == ‘00’ then LoadWritePC(data); else UNPREDICTABLE;
elsif UnalignedSupport() || address<1:0> == ‘00’ then
R[t] = data;
else // Can only apply before ARMv7
if CurrentInstrSet() == InstrSet_ARM then
R[t] = ROR(data, 8*UInt(address<1:0>));
else
R[t] = bits(32) UNKNOWN;
But that covers T1, T2 and A1 encodings in one shot, making it the most confusing.
In any case, they describe what is going on with the encoding as well as overall size of each of the instructions.