68000 Assembly – Passing Parameters via Stack for String Concatenation

I'm working on a Motorola 68000 assembly program that concatenates two strings using a subroutine. The challenge was to implement parameter passing via the stack for both input and output, so I focused on properly setting up and restoring the stack.

I developed the program logic with the help of Sep Roland and Erik Eidt. Afterward, I studied how to pass parameters using the stack, which is why my code is heavily commented.

Task Requirements:

Implement a subroutine in 68000 assembly that uses the stack for parameter passing.
The subroutine takes two input strings:
- A = "Hello"
- B = "World"
It concatenates them into an output string C, resulting in:
- C = "HelloWorld"
The main program should:
1. Prepare the stack by pushing parameters.
2. Call the subroutine.
3. Restore the stack correctly after the function returns.

My Implementation:

          ORG $8000
   
;DATA
StringA DC.B 'Hello',0    ; First string with a null terminator
StringB DC.B 'World',0    ; Second string with a null terminator
StringC DS.B 256          ; Buffer for the concatenated string 

START: 

; The stack pointer (A7) starts at address $8000. 
; In the 68000 architecture, A7 always points to the memory address where 
; the next value will be saved (push operation).

      pea.l StringC ; Equivalent to [move.l #StringC, -(a7)]
                    ; The stack pointer (A7) is decremented by 4 (pushing a longword = 4 bytes)
                    ; Initial A7 = $8000, now A7 = $7FFC
      
      pea.l StringB  ; A7 = $7FF8
      pea.l StringA  ; A7 = $7FF4

; Therefore, the stack (from lowest to highest address) contains:
; A7 = $7FF4  |StringA address| 
; A7 = $7FF8  |StringB address| 
; A7 = $7FFC  |StringC address| 
; A7 = $8000 (original SP value before the push operations)

      bsr.s CopyStrings     ; Call the first subroutine, saving the PC (Program Counter)
                            ; onto the stack
                                
; When executing bsr.s, the processor:
; - Saves the return address (PC) on the stack (another 4 bytes subtracted from A7).
; - Then branches to CopyStrings.

; Upon returning from the subroutine (rts), the stack pointer A7 will remain 
; where the subroutine left it. However, we need to clean up the three parameters 
; (StringA, StringB, StringC) that we previously pushed.

      addq.l #8,a7  ; Restore 8 bytes of the stack
      addq.l #4,a7  ; Restore the remaining 4 bytes (total 12 bytes)

      SIMHALT 

CopyStrings:
      ; At the entry of the subroutine, the stack looks like this:
      ; A7    |Return Address | 
      ; A7+4  |StringA Address| 
      ; A7+8  |StringB Address| 
      ; A7+12 |StringC Address|
      
      move.l 4(a7),a0  ; Retrieve the address of StringA 
      move.l 8(a7),a1  ; Retrieve the address of StringB
      move.l 12(a7),a2 ; Retrieve the address of StringC 
      
CopyA: 
      move.b (a0)+,(a2)+  ; Load a character from StringA into StringC
                          
      bne.s CopyA         ; If the character is not null, continue copying
      subq.l #1,a2        ; Move back 1 byte to overwrite the null terminator

CopyB:
      move.b (a1)+,(a2)+  ; Load a character from StringB into StringC
      bne.s CopyB         ; If the character is not null, continue copying
      rts                 ; Return from subroutine
    
     END START

Questions:

Is my approach to passing parameters via the stack correct?
Are there any optimizations or best practices I should consider?

Any feedback would be greatly appreciated!

Solution

Your approach is correct for C-style calling for passing parameters on the stack. The C-style (especially for older C) passes parameters in reverse so that they appear in forward order on the stack. This is particularly helpful for variadic functions (e.g. like printf). Further, the caller cleans up pushed parameters, again this is particularly helpful for variadic functions. Older C compilers treated all functions as potentially variadic, since function prototypes were not really required in the early days. This meant you could omit parameters (e.g. optional parameters), or pass extra parameters, and since the caller knows what it pushed, it is the one responsible for popping.

Pascal, on the other hand, having no support for variadic functions, would pass the parameters in forward order, and, remove the parameters from the stack by the callee in returning. Since the return address is effectively in the way of the passed parameters, the chip designers made a special return and deallocate instruction, rtd, that supports a function to return to the address on the top of the stack, but also pop arguments after obtaining the return address by popping. (Without that instruction, a callee cleanup epilogue would have to pop the return address into a register, pop args off the stack, and then use the return address in register with an indirect jump).

I would argue that newer C is able to clearly differentiate variadic functions from functions taking fixed arguments both in declaration (i.e. when generating code for return/epilogue) and in use (i.e. at invocation, at call sites), so, while probably also passing parameters backwards, would be able to use the rtd for non-variadic functions.

More over, a modern calling convention for 68k would probably pass at least 6 items in registers, 3 in d0-d2 and 3 in a0-a2, depending on type, as to whether pointers or integers. Overflow parameters would go on the stack (while variadic functions might pass all parameters on the stack).

Your function has no output / return value. If it did, and you wanted that passed via the stack, the caller can push a zero or uninitialized word/long onto the stack before passing the parameters so that the callee could still use rtd to return and deallocate all but the return value.

There is also some question as to whether popping using two 16-bit addq instructions is better than a single longer (32-bit) addi instruction. I would have opted for the longer instruction, to reduce instruction count, and though I don't know the exact timings across the various models of the 68k family, I suspect this might be the same or faster.