assemblyoptimizationmicro-optimizationmotorola68000

68000 Assembly – Build a String from Characters *not* Present in Another & Return Its Length (stack-passed params)


Task

Develop a program for the Motorola 68000 that

  1. Creates a string C containing every character from A that does not appear anywhere in B.
  2. Computes the length of this new string C.

All logic must live in one or more sub-routines with the parameters passed via the stack.


I leaned heavily on the code from my previous question (where I concatenated / compared strings with lots of help from users :-) ) and collected all the optimization tips from the comments here. Below is the new routine that re-uses that calling convention and incorporates every suggestion.

          ORG   $8000

; ---------- data ----------
StringA   DC.B 'I say Hello',0
StringB   DC.B 'I say World',0
StringC   DS.B 256          ; caller must guarantee size
LenRes    DS.W 0            ; length of C (result)

; ---------- main ----------
START:
    PEA.L  StringC          ; &C
    PEA.L  StringB          ; &B
    PEA.L  StringA          ; &A
    BSR.S  FilterAndLength  ; build C, return len in D0
    MOVE.W D0,LenRes        ; store length
    ADDA.L #12,A7           ; pop params
    SIMHALT

; ---------- FilterAndLength(&A,&B,&C) ----------
; OUT: D0 = length of C
; MOD: D1-D2, A0-A3
FilterAndLength:
    MOVEA.L 4(A7),A0        ; A0 -> A
    MOVEA.L 8(A7),A1        ; A1 -> B (base)
    MOVEA.L 12(A7),A2       ; A2 -> C
    CLR.W   D0              ; len = 0

NextCharA:
    MOVE.B  (A0)+,D1        ; next char from A
    TST.B   D1
    BEQ     Finish          ; end of A
    MOVEA.L A1,A3           ; restart scan of B

ScanB:
    MOVE.B  (A3)+,D2
    TST.B   D2
    BEQ     NotFound        ; ran off end of B
    CMP.B   D2,D1
    BEQ     NextCharA       ; found in B, skip char
    BRA     ScanB

NotFound:                   ; char not in B → keep it
    MOVE.B  D1,(A2)+
    ADDQ.W  #1,D0
    BRA     NextCharA

Finish:
    CLR.B   (A2)+           ; zero-terminate C
    RTS

          END   START

Questions

  1. Can this be shortened or further optimized?
  2. Did I interpret the assignment correctly? (skip any character in A that appears at least once in B, build C, return its length)
  3. Extras worth adding? e.g. overflow checks, faster word/long moves if buffers are aligned, or splitting the “char-in-string” test into its own subroutine?

Thanks for any feedback!

What I didn't enter (voluntarily) and explain why [hoping it's the right reason]

If any of these extensions would be useful, let me know and I can integrate them!


Edit — @chtz’s latest suggestions

This is the version that integrates all the improvements listed in the answer @chtz wrote to me [I hope I understood correctly]:

          ORG    $8000

; ---------- data ----------
StringA   DC.B  'I say Hello',0
StringB   DC.B  'I say World',0
StringC   DS.B  256                 ; caller guarantees size
LenRes    DS.W  0                   ; |C|

; ---------- main ----------
START:
    PEA.L  StringC                  ; &C
    PEA.L  StringB                  ; &B
    PEA.L  StringA                  ; &A
    BSR.S  FilterLen_Bitmap         ; build C, length → D0
    MOVE.W D0,LenRes
    ADDA.L #12,A7
    SIMHALT

; ---------------------------------------------------------------
; FilterLen_Bitmap(&A,&B,&C)  –  D0 = length of C
; Stack layout after bitmap push:
;   +32 return  +36 &A  +40 &B  +44 &C
; ---------------------------------------------------------------
; Reg-use: A0/A1, D0-D2.  No other registers need saving.
; ---------------------------------------------------------------
FilterLen_Bitmap:

    ; ---- reserve & clear 32-byte bitmap (8×CLR.L -(A7)) ----
    MOVEQ   #7,D1
ClrBmp: CLR.L  -(A7)
        DBF    D1,ClrBmp           ; A7 → bitmap base

    ; ---- build bitmap from StringB ----
    CLR.W   D0                     ; upper bytes = 0
    MOVEA.L 40(A7),A0              ; A0 = &B
    MOVE.B  (A0)+,D0
    BEQ.S   BuildDone              ; empty B → skip
Bloop: MOVE.W  D0,D1
       LSR.W   #3,D1               ; byte offset 0-31
       BSET    D0,(A7,D1.W)        ; set bit
       MOVE.B  (A0)+,D0
       BNE.S   Bloop
BuildDone:

    ; ---- filter A into C ----
    MOVEA.L 36(A7),A0              ; A0 = &A
    MOVEA.L 44(A7),A1              ; A1 = &C
    CLR.W   D0                     ; length counter = 0
    CLR.W   D1
    MOVE.B  (A0)+,D1
    BEQ.S   Finish
Aloop: MOVE.W  D1,D2
       LSR.W   #3,D2               ; bitmap byte offset
       BTST    D1,(A7,D2.W)        ; char present in B?
       BNE.S   Skip
       MOVE.B  D1,(A1)+            ; keep char
       ADDQ.W  #1,D0
Skip:  MOVE.B  (A0)+,D1
       BNE.S   Aloop

Finish:
    CLR.B   (A1)                   ; null-terminate C
    ADDA.W  #32,A7                 ; pop bitmap
    RTS

          END   START

Is the code okay now or is there still something that can be improved or did I miss something? If there are any errors, I will try to fix them.


Solution

  • Further improvements relative to your edited question (Revision 2, everything untested):

    For clearing 32 bytes on the stack, instead of:

        SUBA.L  #32,A7            ; reserve 32 bytes for bitmap
        LEA     (A7),A4           ; A4 = bitmap base
    
        ; ---- clear bitmap (8×CLR.L = 32 B) ----
        MOVEQ   #7,D3
    ClrBmp: CLR.L  (A4)+
            DBF    D3,ClrBmp
        LEA     (A7),A4           ; restore A4 = base
    

    Do it the other way around directly by decrementing the stackpointer:

            MOVEQ   #7,D3
    ClrBmp: CLR.L  -(A7)
            DBF    D3,ClrBmp
    

    That loop implementation needs 8 bytes, you could unroll it to 8 instructions taking 16 bytes:

        CLR.L  -(A7)
        CLR.L  -(A7)
        CLR.L  -(A7)
        CLR.L  -(A7)
        CLR.L  -(A7)
        CLR.L  -(A7)
        CLR.L  -(A7)
        CLR.L  -(A7)
    

    EDIT: You can actually save some cycles using MOVEM.L for clearing the memory, e.g.

        MOVEQ   #0,D0
        MOVEQ   #0,D1     ; this also makes a later CLR.W D1 redundant
        MOVEA.L D0,A0     ; Clear any 4 registers you don't mind getting destroyed
        MOVEA.L D0,A1     ;
        MOVEM.L D0-D1/A0-A1,-(A7)  ; Clear 4*4=16 bytes of memory
        MOVEM.L D0-D1/A0-A1,-(A7)  ; Clear next 16 bytes of memory
    

    This technique would also allow to set memory to 0xff, using MOVEQ #-1,D0, etc.


    Since you don't change A7 afterwards, there is no need to copy it into A4

    For the B-loop, instead of:

        ; ---- populate bitmap with bytes of B ----
        MOVEA.L 12+32(A7),A1      ; A1 = &B
    Bloop: MOVE.B  (A1)+,D1
           BEQ     Bdone
           MOVE    D1,D2
           LSR.W   #3,D2          ; byte offset 0-31
           MOVE    D1,D3
           ANDI.B  #7,D3          ; bit index 0-7
           BTST    D3,(A4,D2.W)   ; already set? (optional)
           BSET    D3,(A4,D2.W)   ; set bit
           BRA     Bloop
    Bdone:
    

    There is one mistake, that you need to make sure that the upper byte of D2.W needs to be cleared before you copy D1.B into it (or it must be cleared in D1.W). Than, BTST is of course redundant here (you don't even use the result). Also the ANDI.B #7,D3 is not necessary, since BSET only considers the lowest 3 bits of D3 anyways. This also makes copying to D3 not needed.

    Another small improvement, is to move the condition of the loop to the end, having just one unconditional jump at the beginning, instead of once per loop:

        CLR.W   D1                ; unless you have a register which was already cleared
        MOVEA.L 12+32(A7),A1      ; A1 = &B
        BRA.S   Bstart
    Bloop: MOVE.W  D1,D2
           LSR.W   #3,D2          ; byte offset 0-31
           BSET    D1,(A7,D2.W)   ; set bit
    Bstart: MOVE.B  (A1)+,D1
           BNE.S   Bloop
    

    With one instruction (2 bytes) more, you could also copy the end of the loop to the beginning, saving the unconditional jump as well. Also using D0/D1 instead of D1/D2 can save an instruction later:

        CLR.W   D0                ; unless you have a register which was already cleared
        MOVEA.L 12+32(A7),A1      ; A1 = &B
        MOVE.B  (A1)+,D0
        BEQ.S   Bend              ;;;; Actually this would be a special case, 
                                  ;;;; if B is empty you could just copy A
    Bloop: MOVE.W  D0,D1
           LSR.W   #3,D1          ; byte offset 0-31
           BSET    D0,(A7,D1.W)   ; set bit
           MOVE.B  (A1)+,D0
           BNE.S   Bloop
    Bend:  ; Note that D0.W and the upper byte of D1.W are 0 now.
    

    Similar optimizations can be done for the A-loop:

        ; ---- filter A against bitmap, build C ----
        MOVEA.L 16+32(A7),A0      ; A0 = &A
        MOVEA.L 20+32(A7),A2      ; A2 = &C   <--- Here you could use A1 instead
        ;; CLR.W   D0             ; length = 0. Already the case after previous loop
        MOVE.B  (A0)+,D1          ; upper byte of D1.W was already cleared
        BEQ.S   Finish
    Aloop:
        MOVE.W  D1,D2
        LSR.W   #3,D2             ; byte offset
        BTST    D1,(A7,D2.W)
        BNE.S   NextA             ; present in B → skip
        MOVE.B  D1,(A2)+          ; keep char
        ADDQ.W  #1,D0
    NextA:
        MOVE.B  (A0)+,D1
        BNE.S   Aloop
    
    Finish:
        CLR.B   (A2)              ; null-terminate C
        ADDA.W  #32,A7            ; drop bitmap 
             ;; ADDA.W extends the argument to 32bit before adding, alternative:
             ;  LEA   32(A7),A7
        MOVEM.L (A7)+,D2-D3/A2-A4 ; restore regs
        RTS
    

    With my modifications you don't need to store/restore the same registers (if only 1 register needs saving, this also does not require a MOVEM.L.

        MOVE.W  D3,-(A7)   ;; If only D3.W needs (re)storing
    ; ... rest of code
        MOVE.W  (A7)+,D3   ;; If only D3.W needs (re)storing
        RTS