assemblymotorola68000easy68k

Infinite Loop Issue in Scalar Product Calculation of Two Word Arrays in Easy68k


I’m working on an assembly program using Easy68k and programming in M68k assembly. I’m trying to calculate the scalar product (dot product) of two arrays of 16-bit words (A and B). However, I’m encountering an issue where my LOOP becomes infinite, and I can’t figure out why.

Here’s the code:

           ORG $8000

START:
      clr.l d0
      lea.l A,a0
      lea.l B,a1 
      moveq #len,d1   

LOOP:
      move.w (a0)+,d5  
      move.w (a1)+,d6  
      mulu.w d5,d6  
      add.l d6,d0  
      subq.l #1,d1    
      bne.s LOOP 

      move.l d0,scalar_prod   
      SIMHALT

    ; DATES
A DC.W 10,20,30,40 
B DC.W 50,60,70,80  
len EQU 4
scalar_prod DC.L 0  
    
    END START

The Issue

I thought the loop should stop when both d1 and d2 are decremented to zero, but it doesn’t terminate as expected. Can anyone help me understand what might be going wrong here? Fixed by using move.b #len,d1

Optimization Suggestions:

I’d also appreciate any suggestions on how to optimize this code. Thanks! :-)


Solution

  • Fixed by using move.b #len,d1

    That's not entirely correct. Because you use the counter per:

    subq.l #1,d1    
    bne.s LOOP 
    

    the zero flag will pertain to the whole of D1 of which bits 8 through 31 could contain garbage (move.b only loads the lowest byte). The moveq #len,d1 from your final edit, does always load the full longword and does so even without mentioning the size specifier.

    I’d also appreciate any suggestions on how to optimize this code. Thanks! :-)

    move.w (a0)+,d5     8 clocks
    move.w (a1)+,d6     8 clocks
    mulu.w d5,d6       70 clocks (max)
    

    There's no real need to first load D5 and then multiply to D6. You can write it as:

    move.w (a1)+,d6         8 clocks  
    mulu.w (a0)+,d6        74 clocks (max)
    
    subq.l #1,d1        8 clocks
    

    Both byte and word use only half these clocks, so better write:

    subq.b #1,d1            4 clocks
    
    START:
          lea.l   A, a0
          lea.l   B, a1 
          clr.l   d0
          moveq.l #len, d1   
    LOOP:
          move.w  (a1)+, d2
          mulu.w  (a0)+, d2
          add.l   d2, d0
          subq.b  #1, d1    
          bne.s   LOOP
          move.l  d0, scalar_prod   
          SIMHALT