assembly 8051 greatest-common-divisor 8-bit

GCD (Greatest Common Divisor) for two 16-bit numbers

The goal here is to find GCD for two 16-bit numbers stored in little-endian notation. The numbers are stored in the following memory cells:

first number: 0x3000-0x3001
seconds number: 0x4000-0x4001
the result should go into: 0x5000-0x5001

The following example works for 8-bit numbers:

ORG 0000H

MOV 30H, #09
MOV 40H, #06
MOV A, 30H
MOV B, 40H
CJNE A, B, next
LJMP stop

next:
  JNC loop
  MOV A, R2
  MOV B, R1
  
loop:
  MOV R3, B
  DIV AB
  MOV A, R3  
  MOV R7, B
  CJNE R7, #00H, loop
  
stop:
  MOV 50H, A
       
END

Questions: How can I modify this code to operate on two 16-bit numbers? Do I need to operate on individual bytes and then use data pointer (DPTR) to load/move them to the desired memory cells?

(I'm using µVision IDE)

Solution

One way to calculate the Greatest Common Divisor is to use Euclid's algorithm. To get the GCD of two 16-bit numbers would require to scale up the very limited DIV AB we have on 8051. It's not impossible, but I have chosen to use an algorithm that does not need division at all.

The binary GCD algorithm

We can avoid the 8051's limited division capabilities and use an algorithm that replaces modulus calculation by a series of shifts and subtractions. Below is how the Binary GCD algorithm looks like in BASIC:

Procedure BinaryGCD
  ; Calculates R% = GCD(A%,B%)
  If A%=0 Or B%=0 Then
    R% = 1
  Else
    C% = 0
    While Even(A%) And Even(B%)
      Inc C%
      A% = Shr(A%,1)
      B% = Shr(B%,1)
    Wend
    Do
      While Even(A%)
        A% = Shr(A%,1)
      Wend
      While Even(B%)
        B% = Shr(B%,1)
      Wend
      Exit If A%=B%
      If A%>B% Then
        Sub A%,B%
      Else
        Sub B%,A%
      Endif
    Loop
    R% = Shl(A%,C%)
  Endif
Return

Implementation notes

The numbers that are stored in external memory are copied to the Bit Address Area in the internal memory. Why is this? Using the internal memory avoids the troubles of having to manipulate 16-bit addresses all the time. And using the Bit Address Area specifically, allows writing efficient code for testing even/odd conditions. Storing outside of the Bit Address Area would require more bytes and more cycles, not to mention that it would clobber the accumulator. Compare:

; Multiprecision number starts at 58h (not bit addressable)
MOV     A, #1
ANL     A, 58h
JNZ     IsOdd
; Uses 6 bytes and consumes 4 cycles

; Multiprecision number starts at 28h (bit addressable)
JB      64, IsOdd
; Uses 3 bytes and consumes 2 cycles

Please notice that my program can work with unsigned multiprecision numbers ranging from byte to qword and everything in between. I first created a series of handy subroutines to deal with the multibyte numbers. Many of these subroutines are called one time and could thus easily be inlined also. For the benefit of producing a readable program, I chose not to do that.

The subroutines

IN()
For every subroutine the R0, R1, and DPTR registers are, on input, pointers to the start of the involved numbers (least significant byte since this is little endean).

OUT()
On returning from mpLoad and mpStore, both R1 and DPTR will have advanced so as to enable processing adjacent items without having to reload pointer registers.
On returning from mpTest the accumulator A is important. If A is zero then the submitted number is zero.
On returning from mpCmp the accumulator A and the carry flag C are important. If A is zero then the submitted numbers are equal to each other. Else, a clear C indicates that the first number (@R0) is greater then the second number (@R1), and vice versa for a set C.

MOD()
Here are listed all the registers that were used by the subroutine but don't return a documented value.

; Copies a multiprecision number from external memory to internal memory
; IN(R1,DPTR) OUT(R1,DPTR) MOD(A,R2)
mpLoad:
  MOV     R2, #MPC
Load:
  MOVX    A, @DPTR
  MOV     @R1, A
  INC     DPTR
  INC     R1
  DJNZ    R2, Load
  RET

; Copies a multiprecision number from internal memory to external memory
; IN(R1,DPTR) OUT(R1,DPTR) MOD(A,R2)
mpStore:
  MOV     R2, #MPC
Store:
  MOV     A, @R1
  MOVX    @DPTR, A
  INC     DPTR
  INC     R1
  DJNZ    R2, Store
  RET

; Doubles a multiprecision number
; IN(R1) OUT() MOD(A,R1,R2)
mpShl:
  MOV     R2, #MPC
  CLR     C
Shl:
  MOV     A, @R1
  RLC     A
  MOV     @R1, A
  INC     R1
  DJNZ    R2, Shl
  RET

; Halves a multiprecision number
; IN(R1) OUT() MOD(A,R2)
mpShr:
  MOV     R2, #MPC
  MOV     A, R1
  ADD     A, R2   ; -> C == 0
  MOV     R1, A
Shr:
  DEC     R1
  MOV     A, @R1
  RRC     A
  MOV     @R1, A
  DJNZ    R2, Shr
  RET

; Tests if a multiprecision number is zero
; IN(R1) OUT(A) MOD(R1,R2)
mpTest:
  MOV     R2, #MPC
  MOV     A, #0
Test:
  ORL     A, @R1
  INC     R1
  DJNZ    R2, Test
  RET

; Compares two multiprecision numbers
; IN(R0,R1) OUT(A,C) MOD(R0,R1,R2)
mpCmp:
  MOV     R2, #MPC
  MOV     A, R1
  ADD     A, R2
  MOV     R1, A
  MOV     A, R0
  ADD     A, R2   ; -> C == 0
  MOV     R0, A
Cmp:
  DEC     R0
  DEC     R1
  MOV     A, @R0
  SUBB    A, @R1
  JNZ     CmpRet
  DJNZ    R2, Cmp
CmpRet:
  RET

; Subtracts two multiprecision numbers
; IN(R0,R1) OUT() MOD(A,R0,R1,R2)
mpSub:
  MOV     R2, #MPC
  CLR     C
Sub:
  MOV     A, @R0
  SUBB    A, @R1
  MOV     @R0, A
  INC     R0
  INC     R1
  DJNZ    R2, Sub
  RET

The main code

You can easily turn this into a fully fledged program.

MPC     EQU 2              ; Number of bytes per number aka precision
NumX    EQU 20h            ; Internal memory storage address for first number
NumY    EQU 28h            ; Internal memory storage address for second number
; -------------------------
  MOV     R1, #NumX
  MOV     DPTR, #3000h     ; External memory storage address for first number
  LCALL   mpLoad
  MOV     R1, #NumY
  MOV     DPTR, #4000h     ; External memory storage address for second number
  LCALL   mpLoad
; -------------------------
  MOV     R3, #MPC
  MOV     R0, #NumX
  MOV     R1, #NumX
  LCALL   mpTest           ; -> A
  JZ      SetOne
  MOV     R1, #NumY
  LCALL   mpTest           ; -> A
  JNZ     Begin
SetOne:
  INC     A                ; 0 -> 1, 255 -> 0, 255 -> 0, ...
  MOV     @R0, A
  MOV     A, #255
  INC     R0
  DJNZ    R3, SetOne
  SJMP    Copy
; -------------------------
Begin:
  MOV     R3, #0           ; Bits
While1:
  JB      0, While3        ; Jump if NumX[bit0] == 1
  JB      64, While2       ; Jump if NumY[bit0] == 1
  INC     R3               ; Bits++
  MOV     R1, #NumX
  LCALL   mpShr            ; X >> 1
  MOV     R1, #NumY
  LCALL   mpShr            ; Y >> 1
  SJMP    While1
; -------------------------
While2:
  JB      0, While3        ; Jump if NumX[bit0] == 1
  MOV     R1, #NumX
  LCALL   mpShr            ; X >> 1
  SJMP    While2
; - - - - - - - - - - - - -
While3:
  JB      64, Compare      ; Jump if NumY[bit0] == 1
  MOV     R1, #NumY
  LCALL   mpShr            ; Y >> 1
  SJMP    While3
; - - - - - - - - - - - - -
Compare:
  MOV     R0, #NumX
  MOV     R1, #NumY
  LCALL   mpCmp            ; -> A C
  JZ      Equal
  MOV     R0, #NumX
  MOV     R1, #NumY
  JNC     Minus            ; Do (X - Y)
  MOV     R0, #NumY
  MOV     R1, #NumX
Minus:
  LCALL   mpSub            ; Did (X - Y) or (Y - X)
  SJMP    While2
; -------------------------
Equal:
  MOV     A, R3            ; Bits
  JZ      Copy
Scale:                     ; X << Bits
  MOV     R1, #NumX
  LCALL   mpShl            ; X << 1
  DJNZ    R3, Scale
; -------------------------
Copy:
  MOV     R1, #NumX
  MOV     DPTR, #5000h     ; External memory storage address for resulting number
  LCALL   mpStore
; -------------------------
EXIT:
  NOP
  SJMP $

; Here you add the subroutines

END