debugging assembly virtualbox x86-16 dos

debug.exe from DOS 6.22 errored on assembling mov ax, imm16 with the `a` command; works in Win 2000

The details of my problem

Enviroment: I was using MS-DOS 6.22 virtual machine in VirtualBox.

Task: With the help of the built-in debug.exe program which lives in C:\DOS\DEBUG.EXE , I wanted to write some instructions into the memory. The detailed progress is as the screenshot shown below:

The Error Screenshot

And below is a text version of the screenshot I wrote manually:

C:\>debug
-a 1000:0
1000:0000 mov ax,ffff
1000:0003 mov ds,ax
1000:0005 mov ax,2200
              ^error
1000:0006 
-u 1000:0 6
1000:0000 b8ffff  mov ax,ffff
1000:0003 8ed8    mov ds,ax
1000:0005 a5      movsw
1000:0006 42      inc dx
-q

It is also confusing to note that the error instruction is turned to movsw in the same memory address.

What I did

I was trying to use the a command in debug.exe to write these commands into memory:

mov ax,ffff
mov ds,ax

mov ax,2200
mov ss,ax

mov sp,0100

mov ax,[0]
mov bx,[2]

push ax
push bx

pop ax
pop bx

Later, I planned to use other commands to execute the instructions and observe the system's behaviors with the aim of learning Assembly.

However, I was stuck at the step of entering the mov ax,2200 instruction, where the program indicated an error at the a. I have no idea why this is happening.

It worked in the Windows2000 Virtual machine!

I installed a new Windows2000 Virtual machine in VirtualBox and did the same thing in the cmd (enter debug.exe and use a to insert instructions). And it worked! So the problem might be related to the MS-DOS 6.22 virtual machine I used?

What is expected

But still, I'm curious about what might have gone wrong in MS-DOS 6.22 , as this situation seems quite unusual to me. Could it be related to debug.exe ? Or are there specific rules for setting up instructions in memory that I may not be aware of?

Solution

You seem to have chosen the address 1000:0 arbitrarily, but it doesn't belong to you, and you don't know what it might be used for. In fact, in this instance, it happens to contain some of DEBUG's own internal data, and so overwriting it causes its assembler to misbehave.

DOS is not a multitasking operating system and has no memory protection. Everything shares the same memory space, including the code you are trying to assemble, DEBUG.COM itself, and the DOS kernel. So if you overwrite arbitrary memory, you may very well break something other than your own code.

Don't do that. If you just use a without giving an address, you'll assemble into memory that's set aside for you, and can safely be written.

What happens specifically is the following. We can follow along in DEBUG's source code.

(The source code is for MS-DOS 4.0 rather than 6.22, but as I stepped through the execution in 6.22 using Bochs, the disassembly of the relevant parts matches the 4.0 source, so it evidently hadn't changed. The misbehavior was actually slightly different in my test than in yours, likely because DEBUG got loaded at a different address, but there's enough info in your question to infer what must have happened for you.)

The bytes you overwrote provide a hint: 43 58 5A 00 4A 4E 42 is "CXZ\0JNB". We are inside DEBUG's list of mnemonic strings, specifically here; they are separated by null bytes.

Your first instruction MOV AX, FFFF is three bytes long, so it overwrites CXZ. That doesn't cause any tangible harm because you aren't attempting to assemble any JCXZ instructions.

Your second instruction MOV DS, AX, however, overwrites the null byte at 1000:3, which was the separator between JCXZ and JNB. Now the damage is done.

The assembler's code to match mnemonics is here. It scans through the list of strings searching for a match, incrementing CX after each failure. So when it finally does find a match, CX is the index of the matched mnemonic in the list. This list is in the same order as the OPTAB table, which contains pointers to the functions that parse the instruction's operands.

When you overwrote the null byte following JCXZ, you effectively merged JCXZ and JNB into one mnemonic, and now the list is out of sync with OPTAB. So when we are trying to parse the third instruction MOV, which is later in the list than JCXZ and JNB, we only increment CX once when we pass them. This means that when we do reach MOV in the list, CX has been incremented one time too few. So when we use that value (now in BX) as an index into OPTAB, it points not to the proper entry for MOV, but to the preceding one. Which is, you guessed it, MOVSW (or MOVW as the comment puts it).

So we assemble the opcode for MOVSW, which is what you see in the dump. But MOVSW takes no operands, so the rest of the line AX, 2200 is not parsed as an operand, but as a new instruction. Since AX isn't a valid instruction mnemonic, we get an error.