Enviroment: I was using MS-DOS 6.22
virtual machine in VirtualBox
.
Task: With the help of the built-in debug.exe
program which lives in C:\DOS\DEBUG.EXE
, I wanted to write some instructions into the memory. The detailed progress is as the screenshot shown below:
And below is a text version of the screenshot I wrote manually:
C:\>debug
-a 1000:0
1000:0000 mov ax,ffff
1000:0003 mov ds,ax
1000:0005 mov ax,2200
^error
1000:0006
-u 1000:0 6
1000:0000 b8ffff mov ax,ffff
1000:0003 8ed8 mov ds,ax
1000:0005 a5 movsw
1000:0006 42 inc dx
-q
It is also confusing to note that the error instruction is turned to movsw
in the same memory address.
I was trying to use the a
command in debug.exe
to write these commands into memory:
mov ax,ffff
mov ds,ax
mov ax,2200
mov ss,ax
mov sp,0100
mov ax,[0]
mov bx,[2]
push ax
push bx
pop ax
pop bx
Later, I planned to use other commands to execute the instructions and observe the system's behaviors with the aim of learning Assembly.
However, I was stuck at the step of entering the mov ax,2200
instruction, where the program indicated an error at the a
. I have no idea why this is happening.
I installed a new Windows2000 Virtual machine in VirtualBox and did the same thing in the cmd
(enter debug.exe
and use a
to insert instructions). And it worked! So the problem might be related to the MS-DOS 6.22
virtual machine I used?
But still, I'm curious about what might have gone wrong in MS-DOS 6.22
, as this situation seems quite unusual to me. Could it be related to debug.exe
? Or are there specific rules for setting up instructions in memory that I may not be aware of?
You seem to have chosen the address 1000:0
arbitrarily, but it doesn't belong to you, and you don't know what it might be used for. In fact, in this instance, it happens to contain some of DEBUG's own internal data, and so overwriting it causes its assembler to misbehave.
DOS is not a multitasking operating system and has no memory protection. Everything shares the same memory space, including the code you are trying to assemble, DEBUG.COM itself, and the DOS kernel. So if you overwrite arbitrary memory, you may very well break something other than your own code.
Don't do that. If you just use a
without giving an address, you'll assemble into memory that's set aside for you, and can safely be written.
What happens specifically is the following. We can follow along in DEBUG's source code.
(The source code is for MS-DOS 4.0 rather than 6.22, but as I stepped through the execution in 6.22 using Bochs, the disassembly of the relevant parts matches the 4.0 source, so it evidently hadn't changed. The misbehavior was actually slightly different in my test than in yours, likely because DEBUG got loaded at a different address, but there's enough info in your question to infer what must have happened for you.)
The bytes you overwrote provide a hint: 43 58 5A 00 4A 4E 42
is "CXZ\0JNB"
. We are inside DEBUG's list of mnemonic strings, specifically here; they are separated by null bytes.
Your first instruction MOV AX, FFFF
is three bytes long, so it overwrites CXZ
. That doesn't cause any tangible harm because you aren't attempting to assemble any JCXZ
instructions.
Your second instruction MOV DS, AX
, however, overwrites the null byte at 1000:3
, which was the separator between JCXZ
and JNB
. Now the damage is done.
The assembler's code to match mnemonics is here. It scans through the list of strings searching for a match, incrementing CX after each failure. So when it finally does find a match, CX is the index of the matched mnemonic in the list. This list is in the same order as the OPTAB
table, which contains pointers to the functions that parse the instruction's operands.
When you overwrote the null byte following JCXZ
, you effectively merged JCXZ
and JNB
into one mnemonic, and now the list is out of sync with OPTAB
. So when we are trying to parse the third instruction MOV
, which is later in the list than JCXZ
and JNB
, we only increment CX once when we pass them. This means that when we do reach MOV
in the list, CX has been incremented one time too few. So when we use that value (now in BX) as an index into OPTAB
, it points not to the proper entry for MOV
, but to the preceding one. Which is, you guessed it, MOVSW
(or MOVW
as the comment puts it).
So we assemble the opcode for MOVSW
, which is what you see in the dump. But MOVSW
takes no operands, so the rest of the line AX, 2200
is not parsed as an operand, but as a new instruction. Since AX
isn't a valid instruction mnemonic, we get an error.