floating-point mips division ieee-754 qtspim

MIPS: Division algorithm (Dividing significands of IEEE-754 format) gives incorrect answer for the last 4-5 bits (LSB)

For the case where divisor>dividend, the calculated mantissa gives wrong result for the last 4-5 least significant bits.

I'm trying to divide the significands/mantissa of two IEEE-754 floating point numbers. I've used this division algorithm

When divisor>dividend, the mantissa is normalised but still incorrect for the last 4-5 bits.

DivisionAlgorithm: #Divides Dividend Mantissa (in $a0) by Divisor Mantissa (in $a1) for 25 Iterations# #Returns Quotient Value in $v0#

    addi $sp, $sp, -12      #Decrement in $sp (Stack Pointer)
    sw $s0, 0($sp)      #Pushing $s0 into Stack
    sw $ra, 4($sp)      #Pushing $ra into Stack (Function Call Exists)
    sw $s1,8($sp)       #Pushing $s1 into stack 

    move $t0, $a0       #$t0 = Mantissa of Dividend/Remainder
    move $t1, $a1       #$t1 = Mantissa of Divisor

    add $s0, $0, $0     #$s0 = Initialization
    add $v1, $0, $0     #$v1 = 0 (Displacement of Decimal Point Initialized)
    addi $s1,$0,1       #$s1 = 1 (initialize loop variable to 1)
    add $t3,$0,33

loop:   
    beq $t1, $0, check      #If Divisor = 0, Branch to check
    sub $t0, $t0, $t1       #Dividend = Dividend - Divisor
    sll $s0, $s0, 1     #Quotient Register Shifted Left by 1-bit
    slt $t2, $t0, $0
    bne $t2, $0, else       #If Dividend < 0, Branch to else
    addi $s0, $s0, 1        #Setting Quotient LSb to 1
    j out
else:   add $t0, $t0, $t1       #Restoring Dividend Original Value

out:    srl $t1, $t1, 1     #Divisor Register Shifted Right by 1-bit
    j loop

check:  slt $t2, $a0, $a1       #If Dividend < Divisor, Call Function 'Normalization'
    beq $t2, $0, exit       #If Dividend > Divisor, Branch to exit
    move $a0, $s0       #$a0 = Quotient

    jal Normalization       #Function Call 
    j return

exit:   move $v0, $s0       #$v0 = Calculated Mantissa

return: lw $ra, 4($sp)      #Restoring $ra
    lw $s0, 0($sp)      #Restoring $s0
    lw $s1, 8($sp)      #restoring $s1
    addi $sp, $sp, 8        #Increment in $sp (Stack Pointer)
    jr $ra          #Return

Normalization: #Normalizes the Mantissa (in $a0) and Counts the Decimal Places Moved by Decimal Point# #Returns: # i) $v0 = Normalized Mantissa # # ii) $v1 = Count of Decimal Places #

    lui $t0, 0x40       #$t0 = 0x40 (1 at 23rd-bit)
    addi $t2, $0, 1     #$t2 = 1 (Initialization)

loop2:  and $t1, $a0, $t0       #Extracting 23rd-bit of Mantissa 
    bne $t1, $0, else2      #If 23rd-bit = 1; Branch to else2
    addi $t2, $t2, 1        #Increment in Count of Decimal Places Moved
    sll $a0, $a0, 1     #Mantissa Shifted Left (To Extract Next Bit)
    j loop2         

else2:  sll $a0, $a0, 1     #Setting 24th-bit = 1 (Implied)
    move $v0, $a0       #$v0 = Normalized Mantissa
    move $v1, $t2       #$v1 = Displacement of Decimal Point    
    jr $ra          #Return

For example, I expect the output of 2.75/6.355 to be 00111110110111011000111011001110 but the actual output is 00111110110111011000111011010110.

Solution

Your algorithm is incorrect.

A proper algorithm for the restoring division would be

qu=0
rem=dividend
repeat N times
  rem = rem - divisor
  if rem < 0
    rem = rem + divisor
    qu = (qu<<1)
  else
    qu = (qu<<1) + 1
  end
  rem = rem << 1
end

while yours is

qu=0
rem=dividend
repeat N times
  rem = rem - divisor
  if rem < 0
    rem = rem + divisor
    qu = (qu<<1)
  else
    qu = (qu<<1) + 1
  end
  divisor = divisor >> 1  // instead of left shift remainder
end

As the algorithm only relies on the comparison of divisor and rem, it seems equivalent to right shift divisor or to left shift rem. But it is not.
When right shifting divisor, you loose the least significant bits of the divisor.
This may affect the comparison and hence the quotient.
If you print the remainder, you will see that there is a significant impact on it and there may a 2x difference in its value vs the correct result.

It seems to be dangerous to multiply by 2 the remainder, as there may be an overflow. But if we look at the algorithm, we can see that it will not happen.
Initially dividend and divisor are the mantissa of some FP, and hence 1 ≤ divisor, dividend < 2 and the same holds for rem that is initially a copy of dividend. Note that this implies that rem<2*div
Now, we do the first computation step
⚫ if rem < div, then we multiply it by 2, and rem(=2*rem)<2*div.
⚫ if rem ≥ div, then we compute rem−div and multiply it by 2.
As initially rem < 2*div, rem(=2*(rem− div))<2*(2*div− div), and the property rem<2*div is still true.

So at every step we always have the property that rem<2*div and, provided 2*div can be coded, this insures that rem will never overflow.

In terms of implementation, you can code these numbers on the 24 LSB of an integer register. It is largely sufficient as the precision remains unchanged.
In your implementation, you loop 32 times. If you want to write the result in an IEEE mantissa, it is useless and the number of iterations can be reduced. It is sufficient to loop
24 times (mantissa size)
+ 1 (in case dividend < divisor, first bit will be 0, but the second bit is garantied to be at 1)

If you want to do some rounding on the result, you must do two additional steps to get the round and guard bits. After these two computation steps, if remainder ≠ 0, set sticky bit to 1, otherwise set it to 0. Then perform the rounding with the usual rules.