I’m working on an assembly program using Easy68k and programming in M68k assembly. I’m trying to calculate the scalar product (dot product) of two arrays of 16-bit words (A
and B
). However, I’m encountering an issue where my LOOP
becomes infinite, and I can’t figure out why.
Here’s the code:
ORG $8000
START:
clr.l d0
lea.l A,a0
lea.l B,a1
moveq #len,d1
LOOP:
move.w (a0)+,d5
move.w (a1)+,d6
mulu.w d5,d6
add.l d6,d0
subq.l #1,d1
bne.s LOOP
move.l d0,scalar_prod
SIMHALT
; DATES
A DC.W 10,20,30,40
B DC.W 50,60,70,80
len EQU 4
scalar_prod DC.L 0
END START
I thought the loop should stop when both d1
and d2
are decremented to zero, but it doesn’t terminate as expected. Can anyone help me understand what might be going wrong here? Fixed by using move.b #len,d1
I’d also appreciate any suggestions on how to optimize this code. Thanks! :-)
Fixed by using
move.b #len,d1
That's not entirely correct. Because you use the counter per:
subq.l #1,d1 bne.s LOOP
the zero flag will pertain to the whole of D1 of which bits 8 through 31 could contain garbage (move.b
only loads the lowest byte). The moveq #len,d1
from your final edit, does always load the full longword and does so even without mentioning the size specifier.
I’d also appreciate any suggestions on how to optimize this code. Thanks! :-)
move.w (a0)+,d5 8 clocks move.w (a1)+,d6 8 clocks mulu.w d5,d6 70 clocks (max)
There's no real need to first load D5 and then multiply to D6. You can write it as:
move.w (a1)+,d6 8 clocks
mulu.w (a0)+,d6 74 clocks (max)
subq.l #1,d1 8 clocks
Both byte and word use only half these clocks, so better write:
subq.b #1,d1 4 clocks
START:
lea.l A, a0
lea.l B, a1
clr.l d0
moveq.l #len, d1
LOOP:
move.w (a1)+, d2
mulu.w (a0)+, d2
add.l d2, d0
subq.b #1, d1
bne.s LOOP
move.l d0, scalar_prod
SIMHALT