I wrote a Forth interpreter for the J1 CPU, which I am now porting to Z80. In the new version, a colon word is a list of addresses to be called by the interpreter.
My problem is how to insert literals into a colon definition.
Example:
: 1+ 1 + ;
When the compiler encounters literal 1, it has no address to call that push the value onto the stack.
A solution would be to dynamically generate a routine like
ld hl, 1
push hl
ret
somewhere else and use that address in the colon definition.
Another solution is to use an invalid address (such as 0x0000) followed by the literal, which would be treated as a special case by the interpreter: upon encountering an address 0x0000, it simply pushes the value of the next two bytes onto the stack. In this case, the definition of 1+ would be like this in memory:
0x0000 ;
0x0001 ; literal
0x4500 ; + routine address
Is there a better solution?
Edit
In J1 machine code, the 1+ compile to this
addr code comments
0125 8001 1 ; Bit 15=1 push literal into stack
0126 628f %+ ; This combine op codes PLUS and RET
A classic approach is to manipulate the return address:
: (lit) ( -- x ) ( R: addr1 -- addr2 )
\ NB: "(lit)" should not be used as an ordinary word
r> dup cell+ >r @
;
: lit, ( x -- )
['] (lit) compile, ,
\ it is assumed that code space
\ is united with data space
;
: literal ( x -- x | )
state @ if lit, then
; immediate
The word (lit)
above also known as simple lit
. Traditionally, a parenthesized name for a word is used in Forth to emphasize that the word is for internal purposes only.
Let's consider a definition foo
:
: foo 42 + ;
Let one cell takes two address units, and the body of foo
start at the address 1024. Then the threaded code of the body consists of the following items:
(lit)
42
+
exit
When (lit)
is called from foo
, the top value on the return stack is 1026. And that's the address where the number 42
is stored. (lit)
pops the top value from the return stack (which is 1026), increases its copy by the cell size, and places the result (which is 1028) onto the return stack. It also fetches the value at the address 1026 leaving 42 on the stack. Execution then continues at 1028.
A value that follows an instruction in binary code and on which the instruction's behavior depends is sometimes called "immediate argument" (the immediate argument of the instruction). An instruction with its immediate argument can be thought as a single instruction of a larger size.
Not only (lit)
has an immediate argument. Conditional branch and jump instructions usually have an immediate argument as well.
An alternative approach is to generate an anonymous primitive for every unique pair of an instruction and its immediate argument. I'm not sure if this approach has been used in practice.
See also: Open Interpreter: Portability of Return Stack Manipulations, M.L.Gassanenko, 1998