68000 Assembly - Array Sorting

I'm trying to write a Motorola 68000 assembly program to sort an array of bytes in descending order using the ROL instruction. The sorting algorithm follows the Bubble Sort approach.

Here's the code:

ORG $8000
    
START:
      moveq.l #len-1,d7  ; Number of total passes for sorting
      
OUTER_LOOP:
      lea.l array,a0     
      moveq.l #len-2,d6  ; Number of comparisons per pass
      
INNER_LOOP: 
      move.b (a0)+,d2    
      cmp.b (a0),d2      
      blt.s SWAP         ; Swap if needed
     
NOSWAP: 
       subq.l #1,d6
       bge.s INNER_LOOP  
       
       subq.l #1,d7
       bge.s OUTER_LOOP  

       SIMHALT           
      
SWAP:
    move.b -1(a0),d3    ; Load the left element into d3
    move.b  (a0),d4     ; Load the right element into d4

    lsl.w   #8,d3       ; Shift left element to the upper byte: d3 = left << 8
    or.w    d4,d3       ; Merge left and right into d3: d3 = (left << 8) | right
    rol.w   #8,d3       ; Rotate left, swapping bytes: d3 = (right << 8) | left

    move.w  d3,d5       
    lsr.w   #8,d5       ; Extract the new left element (was originally right)
    move.b  d5,-1(a0)   ; Store the new left value (was right)
    andi.w  #$00FF,d3   ; Mask to extract the new right element (was left)
    move.b  d3,(a0)     ; Store the new right value (was left)

    addq.l  #1,a0      

    bra.s NOSWAP        

; DATA
array DC.B 5,3,8,1,7    
len EQU 5               

    END START

Optimization Questions:

Are there redundant instructions that could be removed or simplified?
Could the loop structure be improved for efficiency?

I’d appreciate any insights on how to make this code faster and more elegant.

Output of my assembler:

Solution

In a bubble sort, the array elements that need swapping will always be adjacent. That's something to acknowledge.
Also, once the inner loop has finished its first run, the array's minimum will be at the end of the array. The next time that the inner loop runs we need no longer look at that final element. The same is true for any of the following runs of the inner loop. It is the job of the outer loop to take care of that, simply by loading the inner loop counter with the current value of the outer loop counter.

The improved nested loops look like:

START:
  moveq   #len-2, d7  ; (Number of elements - 1) - 1  
OUTER_LOOP:
  lea.l   array, a0
  move.w  d7, d6      ; Number of comparisons in this pass
INNER_LOOP: 
  move.b  (a0)+, d0    
  cmp.b   (a0), d0  
  bge.s   SKIP
              <<<<< Insert here the code that swaps two adjacent elements  
SKIP: 
  dbra    d6, INNER_LOOP  
  dbra    d7, OUTER_LOOP  
  SIMHALT

The first -1 in the setup for D7 is because for N elements there can only be N-1 comparisons between them.
The second -1 in the setup for D7 comes from using the dbra instruction that runs down to -1.
Loading D6 can be a word-sized operation knowing that dbra only considers the low 16 bits of the data register.

Your SWAP code should be in the loop and not somewhere else requiring a bra instruction to come back, as well as it uses too many registers and has redundant instructions:

Adding 1 to the A0 register should not happen here.
No need to reload the left element as it is already present in D0.
…

Using ROL is apparently required by the task description.
This is one of the many solutions that use the ROL instruction:

  rol.w    #8, d0       ; Move left element to bits 8-15
  move.b   (a0), d0     ; Move right element to bits 0-7
  move.b   d0, -1(a0)   ; Move right element to left position in array
  rol.w    #8, d0       ; Move left element to bits 0-7
  move.b   d0, (a0)     ; Move left element to right position in array

Apart from that one silly task requirement, I’d say that this solution makes the code both faster and more elegant.

But without the restriction it's even nicer:

  move.b  (a0), -1(a0)
  move.b  d0, (a0)