I've been putting together my own disassembler for Sega Mega Drive ROMs, basing my initial work on the MOTOROLA M68000 FAMILY Programmer’s Reference Manual. Having disassembled a considerable chunk of the ROM, I've attempted to reassemble this disassembled output, using VASM as it can accept the Motorola assembly syntax, using its mot
syntax module.
Now, for the vast majority of the reassembly, this has worked well, however there is one wrinkle with operations that have effective addresses defined by the "Program Counter Indirect with Index (8-Bit Displacement) Mode". Given that I'm only now learning Motorola 68000 assembly, I wanted to confirm my understanding and to ask: what is the proper syntax for these operations?
For example, if I have two words:
4ebb 0004
I've interpreted this as a JSR
with the target destination being the sum of:
pc
0x04
d0
(Given that I am restricting myself to the 68000, I've elided any consideration of size
and scale
in the extension word).
Based on how this addressing mode is described in the reference manual, I've emitted this as:
jsr ($04,pc,d0)
However, when I feed this back into VASM it will emit the following error:
error 2030 in line X of "XXXX.asm": displacement out of range
> jsr ($04,pc,d0)
which seems a very strange error to emit, given that the displacement can't be known until runtime, due to the use of the d0
register. Playing around with this, it appears to use the first part of the operand ($04
) as the absolute target destination, and calculates a different displacement based on that.
as
If I switch to GNU as
, the syntax that provides identical output to the original ROM is:
jsr %pc@(0x04,%d0:w)
which appears to indicate that the first part of the operand is the displacement. However, when I disassemble this using objdump
, the listed instruction is:
jsr %pc@(0x6,%d0:w)
which seems to indicate that, in the MIT syntax that as
uses, the first part of the operand is once again the absolute address.
This confusion between the two syntaxes and even between the as
assembly and subsequent disassembly makes me wonder what the correct syntax should be, or if perhaps instructions using this addressing mode tend to be generated by the assembler as part of macros or other higher level constructs.
Thinking about the points @tofro has put me in the correct direction, and this is what I've arrived at:
Both of the assemblers I've tested (VASM and GNU as
) will properly handle a label provided in what I had considered the "displacement" part of the operand, and will calculate the displacement based on the current PC and the destination label. Given the convenience of this from the programmers point of view, and @tofro's observations, I'd say this is the way this kind of addressing is intended to be used.
So, assembling the following with vasm
:
org $80
jsr (label,pc,d0)
nop
label:
nop
produces a listing file like so:
Sections:
00: "seg80" (80-88)
Source: "vasm-label.asm"
1: org $80
2:
00:00000080 4EBB0004 3: jsr (label,pc,d0)
00:00000084 4E71 4: nop
5:
6: label:
00:00000086 4E71 7: nop
8:
Symbols by name:
label A:00000086
Symbols by value:
00000086 label
and assembling the following with as
:
.org 0x80
jsr %pc@(label,%d0:w)
nop
label:
nop
produces a listing file like so:
68K GAS as-label.asm page 1
1 0000 0000 0000 .org 0x80
1 0000 0000
1 0000 0000
1 0000 0000
1 0000 0000
2
3 0080 4EBB 0004 jsr %pc@(label,%d0:w)
4 0084 4E71 nop
5
6 label:
7 0086 4E71 nop
68K GAS as-label.asm page 2
DEFINED SYMBOLS
as-label.asm:6 .text:0000000000000086 label
NO UNDEFINED SYMBOLS
We can see that both assemblers output the same two words for the instruction (as per my original example):
4ebb 0004
Going forward, once all the labels have been properly identified in my disassembly, this will be the most user-friendly format to emit.
This is where the two differ, and it comes down to whether they treat the provided "displacement" part of the operand as a displacement or as a destination address.
Going back to vasm
, assembling:
org $80
jsr ($04,pc,d0)
nop
label:
nop
produces:
Sections:
00: "seg80" (80-88)
Source: "vasm-disp.asm"
1: org $80
2:
00:00000080 4EBB0082 3: jsr ($04,pc,d0)
00:00000084 4E71 4: nop
5:
6: label:
00:00000086 4E71 7: nop
8:
Symbols by name:
label A:00000086
Symbols by value:
00000086 label
showing that the provided displacement ($04
) is treated as the base target of the operand, and a negative offset (0x82
or -0x7e
) is calculated and emitted.
Contrast this with as
, where assembling:
.org 0x80
jsr %pc@(0x04,%d0:w)
nop
label:
nop
produces:
68K GAS as-disp.asm page 1
1 0000 0000 0000 .org 0x80
1 0000 0000
1 0000 0000
1 0000 0000
1 0000 0000
2
3 0080 4EBB 0004 jsr %pc@(0x04,%d0:w)
4 0084 4E71 nop
5
6 label:
7 0086 4E71 nop
68K GAS as-disp.asm page 2
DEFINED SYMBOLS
as-disp.asm:6 .text:0000000000000086 label
NO UNDEFINED SYMBOLS
showing that the provided value (0x04
) is considered the displacment, and directly encoded into the output bytes.
In my circumstances, being able to pass the full address of an unresolved label in the operand when using VASM is quite useful when fine-tuning the disassembly algorithm, so this is most likely what I'll be using for now.
In my opinion, both
jsr <displacement>(pc,<data register>)
or
jsr (<displacement>,pc,<data register>)
is correct syntax. BUT
No one would ever write such code into an assembler. What every assembler expects (monitor programs might be different) is a (relocatable) label instead of a literal number when calculating PC offsets. It would then calculate the numerical displacement from the distance between current PC and the label. You simply confused that mechanism.
You might find that if you use any other address register but PC that your syntax might be accepted. Most assemblers simply don't like literal PC offsets. You should expect something similar with short relative branches, like in
bra.s -4
EDIT:
My assemblers seem to understand such a construct if the displacement is explicitely marked as a relocatable address by relating it to "*" (the current PC) like in
jsr *-4(pc,d0.w)
(Didn't try vasm, though)