algorithm discrete-mathematics greatest-common-divisor computability

Lehmer's extended GCD algorithm implementation

While doing my own BigInteger implementation, I got stuck with the extended GCD algorithm, which is fundamental for finding modular multiplicative inverse. As the well-known Euclidean approach performs too slow, with hybrid and binary algorithms only 5-10 times faster, the choice was for the Lehmer's modification to the classic algorithm. But the difficulty is that, when it comes to describing the Lehmer's, all books that I found (Knuth, Handbook of Applied Cryptography, Internets, etc) have the same shortcomings:

Explanation is based on several tricks:
- the input numbers are always of the same length;
- the abstract CPU has signed registers, which can hold both the digit and the sign;
- the abstract CPU has semi-unlimited registers, i. e. it never overflows.
Only the basic GCD algorithm is provided, without focusing on the inverse cofactors.

As for the first problem, I was initially surprised by being unable to find any real-world implementation (don't point me to the GNU MP library — it's not a source to learn from), but finally took inspiration by decompiling the Microsoft's implementation from .Net 4.0, which is obviously based on the ideas from the paper “A double-digit Lehmer-Euclid algorithm for finding the GCD of long integers” by Jebelean. The resulting function is large, it looks scary, but works just great.

But Microsoft's library provides the basic functionality only, no cofactors are computed. Well, to be precise, some cofactors are computed during the shorthand step, and during the very first step those cofactors simply are the initial ones, but after the longhand step is performed then they do not match anymore. My current solution is to update the “real” cofactors in parallel with the “substitute” ones (except the very first step), but it makes the performance to drop below zero: the function now completes only 25-50 % faster than the binary method in basic mode. So, the problem is that, while the input numbers are fully updated during longhand steps only, the cofactors are updated on each shorthand step's iteration as well, thus destroying almost any benefit from Lehmer's approach.

To speed up things a little, I implemented a “fused multiply-add” function, because a “fused multiply-multiply-subtract” really does help updating the input numbers, — but this time the impact was negligible. Another improvement is based on the fact that usually only one cofactor is necessary, so the other one can be just not computed at all. This should halve the overhead (or even more so, since the second number is usually significantly smaller than the first one), however in practice the overhead reduces only by 25 to 50 % of expected.

Consequently, my questions come down to this:

Is there any full-scale explanation of Lehmer's algorithm, tied to practical implementation on real-world hardware (with unsigned words of limited size)?
Same as above, but regarding the extended GCD computation.

So, as much as I'm happy with the performance of basic algorithm, the opposite applies to the extended mode of operation, which is the primary in my case.

Solution

Finally, I consulted a mathematician and he quickly figured out the right formulae — very similar to those I was trying myself, bit still slightly different. This allowed to update the cofactors on longhand steps only, at the same time as the input numbers are fully updated.

However, to my big surprise, this measure alone had minor impact on the performance. Only when I re-implemented it as a “fused (A×X + B×Y)”, the speed improvement became noticeable. When computing both cofactors, it now runs at 56 % for 5-digit numbers and 34 % for 32K-digits, as compared to pure Lehmer GCD algorithm; for a single cofactor, the rate is 70 % and 52 % respectively. With previous implementations, it was merely 33% to 7% for both cofactors and 47 % to 14 % for single cofactor, so my dissatisfaction was obvious.

As for writing a paper as andy256 recommended so that other implementers would not have the same trouble, it won't be easy. I already wrote a “small” paper when explaining my current implementation to the mathematician, and it quite rapidly exceeded three A4-sized sheets — while containing the basic ideas only, without detailed explanations, overflow checks, branching, unrolling, etc. Now I partially understand why Knuth and others have used dirty tricks to keep the story short. Currently, I have no idea how to achieve the same level of simplicity while not loosing thoroughness; perhaps, it would require several big flowcharts combined with commentaries.

Update. It looks like I won't be able to complete and publish the library in near future (still have no luck in understanding Newton—Raphson division and Montgomery reduction), so I'll simply post the resulting function for those who are interested.

The code doesn't include obvious functions like ComputeGCD_Euclid and general-purpose routines like ComputeDivisionLonghand. The code also lacks any comments (with few exceptions) — you should be already familiar with Lehmer's idea in general and double-digit shorthand technique mentioned above, if you want to understand it and integrate into your own library.

An overview of number representation in my library. Digits are 32-bit unsigned integers, so that 64-bit unsigned and signed arithmetic may be used when needed. Digits are stored in a plain array (ValueDigits) from least to most significant (LSB), the actual size is stored explicitly (ValueLength), i. e. functions try to predict result size, but don't optimize memory consumption after computation. Objects are of value type (struct in .Net), but they reference digit arrays; therefore, objects are invariant, i. e. a = a + 1 creates a new object instead of altering an existing one.

Public Shared Function ComputeGCD(ByVal uLeft As BigUInteger, ByVal uRight As BigUInteger,
        ByRef uLeftInverse As BigUInteger, ByRef uRightInverse As BigUInteger, ByVal fComputeLeftInverse As Boolean, ByVal fComputeRightInverse As Boolean) As BigUInteger

    Dim fSwap As Boolean = False
    Select Case uLeft.CompareTo(uRight)
        Case 0
            uLeftInverse = Instance.Zero : uRightInverse = Instance.One : Return uRight
        Case Is < 0
            fSwap = fComputeLeftInverse : fComputeLeftInverse = fComputeRightInverse : fComputeRightInverse = fSwap
            fSwap = True : Swap(uLeft, uRight)
    End Select

    Dim uResult As BigUInteger
    If (uLeft.ValueLength = 1) AndAlso (uRight.ValueLength = 1) Then
        Dim wLeftInverse As UInt32, wRightInverse As UInt32
        uResult = ComputeGCD_Euclid(uLeft.DigitLowest, uRight.DigitLowest, wLeftInverse, wRightInverse)
        uLeftInverse = wLeftInverse : uRightInverse = wRightInverse
    ElseIf uLeft.ValueLength <= 2 Then
        uResult = ComputeGCD_Euclid(uLeft, uRight, uLeftInverse, uRightInverse)
    Else
        uResult = ComputeGCD_Lehmer(uLeft, uRight, uLeftInverse, uRightInverse, fComputeLeftInverse, fComputeRightInverse)
    End If

    If fSwap Then Swap(uLeftInverse, uRightInverse)

    Return uResult
End Function

Private Shared Function ComputeGCD_Lehmer(ByVal uLeft As BigUInteger, ByVal uRight As BigUInteger,
        ByRef uLeftInverse As BigUInteger, ByRef uRightInverse As BigUInteger, ByVal fComputeLeftInverse As Boolean, ByVal fComputeRightInverse As Boolean) As BigUInteger


    Dim uLeftCur As BigUInteger = uLeft, uRightCur As BigUInteger = uRight
    Dim uLeftInvPrev As BigUInteger = Instance.One, uRightInvPrev As BigUInteger = Instance.Zero,
        uLeftInvCur As BigUInteger = uRightInvPrev, uRightInvCur As BigUInteger = uLeftInvPrev,
        fInvInit As Boolean = False, fIterationIsEven As Boolean = True

    Dim dwLeftCur, dwRightCur As UInt64
    Dim wLeftInvPrev, wRightInvPrev, wLeftInvCur, wRightInvCur As UInt32
    Dim dwNumeratorMore, dwNumeratorLess, dwDenominatorMore, dwDenominatorLess, dwQuotientMore, dwQuotientLess As UInt64,
        wQuotient As UInt32
    Const nSubtractionThresholdBits As Byte = (5 - 1)

    Dim ndxDigitMax As Integer, fRightIsShorter As Boolean

    Dim fResultFound As Boolean = False
    Dim uRemainder As BigUInteger = uRightCur, uQuotient As BigUInteger
    Dim uTemp As BigUInteger = Nothing, dwTemp, dwTemp2 As UInt64

    Do While uLeftCur.ValueLength > 2

        ndxDigitMax = uLeftCur.ValueLength - 1 : fRightIsShorter = (uRightCur.ValueLength < uLeftCur.ValueLength)

        Dim fShorthandStep As Boolean = True, fShorthandIterationIsEven As Boolean
        If fRightIsShorter AndAlso (uLeftCur.ValueLength - uRightCur.ValueLength > 1) Then fShorthandStep = False

        If fShorthandStep Then

            dwLeftCur = uLeftCur.ValueDigits(ndxDigitMax - 1) Or (CULng(uLeftCur.ValueDigits(ndxDigitMax)) << DigitSize.Bits)
            dwRightCur = uRightCur.ValueDigits(ndxDigitMax - 1) Or If(fRightIsShorter, DigitValue.Zero, CULng(uRightCur.ValueDigits(ndxDigitMax)) << DigitSize.Bits)
            If ndxDigitMax >= 2 Then
                Dim nNormHead As Byte = GetNormalizationHead(uLeftCur.ValueDigits(ndxDigitMax))
                If nNormHead <> ByteValue.Zero Then
                    dwLeftCur = (dwLeftCur << nNormHead) Or (uLeftCur.ValueDigits(ndxDigitMax - 2) >> (DigitSize.Bits - nNormHead))
                    dwRightCur = (dwRightCur << nNormHead) Or (uRightCur.ValueDigits(ndxDigitMax - 2) >> (DigitSize.Bits - nNormHead))
                End If
            End If

            If CUInt(dwRightCur >> DigitSize.Bits) = DigitValue.Zero Then fShorthandStep = False

        End If

        If fShorthandStep Then

            ' First iteration, where overflow may occur in general formulae.

            If dwLeftCur = dwRightCur Then
                fShorthandStep = False
            Else
                If dwLeftCur = DoubleValue.Full Then dwLeftCur >>= 1 : dwRightCur >>= 1
                dwDenominatorMore = dwRightCur : dwDenominatorLess = dwRightCur + DigitValue.One
                dwNumeratorMore = dwLeftCur + DigitValue.One : dwNumeratorLess = dwLeftCur

                If (dwNumeratorMore >> nSubtractionThresholdBits) <= dwDenominatorMore Then
                    wQuotient = DigitValue.Zero
                    Do
                        wQuotient += DigitValue.One : dwNumeratorMore -= dwDenominatorMore
                    Loop While dwNumeratorMore >= dwDenominatorMore
                    dwQuotientMore = wQuotient
                Else
                    dwQuotientMore = dwNumeratorMore \ dwDenominatorMore
                    If dwQuotientMore >= DigitValue.BitHi Then fShorthandStep = False
                    wQuotient = CUInt(dwQuotientMore)
                End If

                If fShorthandStep Then
                    If (dwNumeratorLess >> nSubtractionThresholdBits) <= dwDenominatorLess Then
                        wQuotient = DigitValue.Zero
                        Do
                            wQuotient += DigitValue.One : dwNumeratorLess -= dwDenominatorLess
                        Loop While dwNumeratorLess >= dwDenominatorLess
                        dwQuotientLess = wQuotient
                    Else
                        dwQuotientLess = dwNumeratorLess \ dwDenominatorLess
                    End If
                    If dwQuotientMore <> dwQuotientLess Then fShorthandStep = False
                End If

            End If

        End If

        If fShorthandStep Then

            ' Prepare for the second iteration.
            wLeftInvPrev = DigitValue.Zero : wLeftInvCur = DigitValue.One
            wRightInvPrev = DigitValue.One : wRightInvCur = wQuotient
            dwTemp = dwLeftCur - wQuotient * dwRightCur : dwLeftCur = dwRightCur : dwRightCur = dwTemp
            fShorthandIterationIsEven = True

            fIterationIsEven = Not fIterationIsEven

            ' Other iterations, no overflow possible(?).
            Do

                If fShorthandIterationIsEven Then
                    If dwRightCur = wRightInvCur Then Exit Do
                    dwDenominatorMore = dwRightCur - wRightInvCur : dwDenominatorLess = dwRightCur + wLeftInvCur
                    dwNumeratorMore = dwLeftCur + wRightInvPrev : dwNumeratorLess = dwLeftCur - wLeftInvPrev
                Else
                    If dwRightCur = wLeftInvCur Then Exit Do
                    dwDenominatorMore = dwRightCur - wLeftInvCur : dwDenominatorLess = dwRightCur + wRightInvCur
                    dwNumeratorMore = dwLeftCur + wLeftInvPrev : dwNumeratorLess = dwLeftCur - wRightInvPrev
                End If

                If (dwNumeratorMore >> nSubtractionThresholdBits) <= dwDenominatorMore Then
                    wQuotient = DigitValue.Zero
                    Do
                        wQuotient += DigitValue.One : dwNumeratorMore -= dwDenominatorMore
                    Loop While dwNumeratorMore >= dwDenominatorMore
                    dwQuotientMore = wQuotient
                Else
                    dwQuotientMore = dwNumeratorMore \ dwDenominatorMore
                    If dwQuotientMore >= DigitValue.BitHi Then Exit Do
                    wQuotient = CUInt(dwQuotientMore)
                End If

                If (dwNumeratorLess >> nSubtractionThresholdBits) <= dwDenominatorLess Then
                    wQuotient = DigitValue.Zero
                    Do
                        wQuotient += DigitValue.One : dwNumeratorLess -= dwDenominatorLess
                    Loop While dwNumeratorLess >= dwDenominatorLess
                    dwQuotientLess = wQuotient
                Else
                    dwQuotientLess = dwNumeratorLess \ dwDenominatorLess
                End If
                If dwQuotientMore <> dwQuotientLess Then Exit Do

                dwTemp = wLeftInvPrev + wQuotient * wLeftInvCur : dwTemp2 = wRightInvPrev + wQuotient * wRightInvCur
                If (dwTemp >= DigitValue.BitHi) OrElse (dwTemp2 >= DigitValue.BitHi) Then Exit Do
                wLeftInvPrev = wLeftInvCur : wLeftInvCur = CUInt(dwTemp)
                wRightInvPrev = wRightInvCur : wRightInvCur = CUInt(dwTemp2)
                dwTemp = dwLeftCur - wQuotient * dwRightCur : dwLeftCur = dwRightCur : dwRightCur = dwTemp
                fShorthandIterationIsEven = Not fShorthandIterationIsEven

                fIterationIsEven = Not fIterationIsEven

            Loop

        End If

        If (Not fShorthandStep) OrElse (wRightInvPrev = DigitValue.Zero) Then
            ' Longhand step.

            uQuotient = ComputeDivisionLonghand(uLeftCur, uRightCur, uTemp) : If uTemp.IsZero Then fResultFound = True : Exit Do
            uRemainder = uTemp

            fIterationIsEven = Not fIterationIsEven
            If fComputeLeftInverse Then
                uTemp = uLeftInvPrev + uQuotient * uLeftInvCur : uLeftInvPrev = uLeftInvCur : uLeftInvCur = uTemp
            End If
            If fComputeRightInverse Then
                uTemp = uRightInvPrev + uQuotient * uRightInvCur : uRightInvPrev = uRightInvCur : uRightInvCur = uTemp
            End If
            fInvInit = True

            uLeftCur = uRightCur : uRightCur = uRemainder

        Else
            ' Shorthand step finalization.

            If Not fInvInit Then
                If fComputeLeftInverse Then uLeftInvPrev = wLeftInvPrev : uLeftInvCur = wLeftInvCur
                If fComputeRightInverse Then uRightInvPrev = wRightInvPrev : uRightInvCur = wRightInvCur
                fInvInit = True
            Else
                If fComputeLeftInverse Then ComputeFusedMulMulAdd(uLeftInvPrev, uLeftInvCur, wLeftInvPrev, wLeftInvCur, wRightInvPrev, wRightInvCur)
                If fComputeRightInverse Then ComputeFusedMulMulAdd(uRightInvPrev, uRightInvCur, wLeftInvPrev, wLeftInvCur, wRightInvPrev, wRightInvCur)
            End If

            ComputeFusedMulMulSub(uLeftCur, uRightCur, wLeftInvPrev, wLeftInvCur, wRightInvPrev, wRightInvCur, fShorthandIterationIsEven)

        End If

    Loop

    ' Final rounds: numbers are quite short now.
    If Not fResultFound Then

        ndxDigitMax = uLeftCur.ValueLength - 1 : fRightIsShorter = (uRightCur.ValueLength < uLeftCur.ValueLength)
        If ndxDigitMax = 0 Then
            dwLeftCur = uLeftCur.ValueDigits(0)
            dwRightCur = uRightCur.ValueDigits(0)
        Else
            dwLeftCur = uLeftCur.ValueDigits(0) Or (CULng(uLeftCur.ValueDigits(1)) << DigitSize.Bits)
            dwRightCur = uRightCur.ValueDigits(0) Or If(fRightIsShorter, DigitValue.Zero, CULng(uRightCur.ValueDigits(1)) << DigitSize.Bits)
        End If

        Do While dwLeftCur >= DigitValue.BitHi

            Dim dwRemainder As UInt64 = dwLeftCur

            If (dwRemainder >> nSubtractionThresholdBits) <= dwRightCur Then
                wQuotient = DigitValue.Zero
                Do
                    wQuotient += DigitValue.One : dwRemainder -= dwRightCur
                Loop While dwRemainder >= dwRightCur
                dwQuotientMore = wQuotient
            Else
                dwQuotientMore = dwLeftCur \ dwRightCur
                dwRemainder = dwLeftCur - dwQuotientMore * dwRightCur
            End If

            If dwRemainder = DigitValue.Zero Then fResultFound = True : Exit Do


            fIterationIsEven = Not fIterationIsEven
            If dwQuotientMore < DigitValue.BitHi Then
                wQuotient = CUInt(dwQuotientMore)
                If fComputeLeftInverse Then ComputeFusedMulAdd(uLeftInvPrev, uLeftInvCur, wQuotient)
                If fComputeRightInverse Then ComputeFusedMulAdd(uRightInvPrev, uRightInvCur, wQuotient)
            Else
                If fComputeLeftInverse Then
                    uTemp = uLeftInvPrev + dwQuotientMore * uLeftInvCur : uLeftInvPrev = uLeftInvCur : uLeftInvCur = uTemp
                End If
                If fComputeRightInverse Then
                    uTemp = uRightInvPrev + dwQuotientMore * uRightInvCur : uRightInvPrev = uRightInvCur : uRightInvCur = uTemp
                End If
            End If

            dwLeftCur = dwRightCur : dwRightCur = dwRemainder

        Loop

        If fResultFound Then

            uRightCur = dwRightCur

        Else

            ' Final rounds: both numbers have only one digit now, and this digit has MS-bit unset.
            Dim wLeftCur As UInt32 = CUInt(dwLeftCur), wRightCur As UInt32 = CUInt(dwRightCur)

            Do

                Dim wRemainder As UInt32 = wLeftCur

                If (wRemainder >> nSubtractionThresholdBits) <= wRightCur Then
                    wQuotient = DigitValue.Zero
                    Do
                        wQuotient += DigitValue.One : wRemainder -= wRightCur
                    Loop While wRemainder >= wRightCur
                Else
                    wQuotient = wLeftCur \ wRightCur
                    wRemainder = wLeftCur - wQuotient * wRightCur
                End If

                If wRemainder = DigitValue.Zero Then fResultFound = True : Exit Do

                fIterationIsEven = Not fIterationIsEven
                If fComputeLeftInverse Then ComputeFusedMulAdd(uLeftInvPrev, uLeftInvCur, wQuotient)
                If fComputeRightInverse Then ComputeFusedMulAdd(uRightInvPrev, uRightInvCur, wQuotient)

                wLeftCur = wRightCur : wRightCur = wRemainder

            Loop

            uRightCur = wRightCur

        End If


    End If

    If fComputeLeftInverse Then
        uLeftInverse = If(fIterationIsEven, uRight - uLeftInvCur, uLeftInvCur)
    End If
    If fComputeRightInverse Then
        uRightInverse = If(fIterationIsEven, uRightInvCur, uLeft - uRightInvCur)
    End If

    Return uRightCur
End Function

''' <remarks>All word-sized parameters must have their most-significant bit unset.</remarks>
Private Shared Sub ComputeFusedMulMulAdd(
        ByRef uLeftInvPrev As BigUInteger, ByRef uLeftInvCur As BigUInteger,
        ByVal wLeftInvPrev As UInt32, ByVal wLeftInvCur As UInt32, ByVal wRightInvPrev As UInt32, ByVal wRightInvCur As UInt32)

    Dim ndxDigitMaxPrev As Integer = uLeftInvPrev.ValueLength - 1, ndxDigitMaxCur As Integer = uLeftInvCur.ValueLength - 1,
        ndxDigitMaxNew As Integer = ndxDigitMaxCur + 1

    Dim awLeftInvPrev() As UInt32 = uLeftInvPrev.ValueDigits, awLeftInvCur() As UInt32 = uLeftInvCur.ValueDigits
    Dim awLeftInvPrevNew(0 To ndxDigitMaxNew) As UInt32, awLeftInvCurNew(0 To ndxDigitMaxNew) As UInt32
    Dim dwResult As UInt64, wCarryLeftPrev As UInt32 = DigitValue.Zero, wCarryLeftCur As UInt32 = DigitValue.Zero
    Dim wDigitLeftInvPrev, wDigitLeftInvCur As UInt32

    For ndxDigit As Integer = 0 To ndxDigitMaxPrev
        wDigitLeftInvPrev = awLeftInvPrev(ndxDigit) : wDigitLeftInvCur = awLeftInvCur(ndxDigit)

        dwResult = wCarryLeftPrev + wLeftInvPrev * CULng(wDigitLeftInvPrev) + wRightInvPrev * CULng(wDigitLeftInvCur)
        awLeftInvPrevNew(ndxDigit) = CUInt(dwResult And DigitValue.Full) : wCarryLeftPrev = CUInt(dwResult >> DigitSize.Bits)

        dwResult = wCarryLeftCur + wLeftInvCur * CULng(wDigitLeftInvPrev) + wRightInvCur * CULng(wDigitLeftInvCur)
        awLeftInvCurNew(ndxDigit) = CUInt(dwResult And DigitValue.Full) : wCarryLeftCur = CUInt(dwResult >> DigitSize.Bits)

    Next

    If ndxDigitMaxCur > ndxDigitMaxPrev Then

        For ndxDigit As Integer = ndxDigitMaxPrev + 1 To ndxDigitMaxCur
            wDigitLeftInvCur = awLeftInvCur(ndxDigit)

            dwResult = wCarryLeftPrev + wRightInvPrev * CULng(wDigitLeftInvCur)
            awLeftInvPrevNew(ndxDigit) = CUInt(dwResult And DigitValue.Full) : wCarryLeftPrev = CUInt(dwResult >> DigitSize.Bits)

            dwResult = wCarryLeftCur + wRightInvCur * CULng(wDigitLeftInvCur)
            awLeftInvCurNew(ndxDigit) = CUInt(dwResult And DigitValue.Full) : wCarryLeftCur = CUInt(dwResult >> DigitSize.Bits)

        Next

    End If

    If wCarryLeftPrev <> DigitValue.Zero Then awLeftInvPrevNew(ndxDigitMaxNew) = wCarryLeftPrev
    If wCarryLeftCur <> DigitValue.Zero Then awLeftInvCurNew(ndxDigitMaxNew) = wCarryLeftCur

    uLeftInvPrev = New BigUInteger(awLeftInvPrevNew) : uLeftInvCur = New BigUInteger(awLeftInvCurNew)

End Sub

''' <remarks>All word-sized parameters must have their most-significant bit unset.</remarks>
Private Shared Sub ComputeFusedMulMulSub(
        ByRef uLeftCur As BigUInteger, ByRef uRightCur As BigUInteger,
        ByVal wLeftInvPrev As UInt32, ByVal wLeftInvCur As UInt32, ByVal wRightInvPrev As UInt32, ByVal wRightInvCur As UInt32,
        ByVal fShorthandIterationIsEven As Boolean)

    Dim ndxDigitMax As Integer = uLeftCur.ValueLength - 1,
        fRightIsShorter As Boolean = (uRightCur.ValueLength < uLeftCur.ValueLength),
        ndxDigitStop As Integer = If(fRightIsShorter, ndxDigitMax - 1, ndxDigitMax)

    Dim awLeftCur() As UInt32 = uLeftCur.ValueDigits, awRightCur() As UInt32 = uRightCur.ValueDigits
    Dim awLeftNew(0 To ndxDigitMax) As UInt32, awRightNew(0 To ndxDigitStop) As UInt32
    Dim iTemp As Int64, wCarryLeft As Int32 = 0I, wCarryRight As Int32 = 0I
    Dim wDigitLeftCur, wDigitRightCur As UInt32

    If fShorthandIterationIsEven Then

        For ndxDigit As Integer = 0 To ndxDigitStop
            wDigitLeftCur = awLeftCur(ndxDigit) : wDigitRightCur = awRightCur(ndxDigit)
            iTemp = wCarryLeft + CLng(wDigitRightCur) * wRightInvPrev - CLng(wDigitLeftCur) * wLeftInvPrev
            awLeftNew(ndxDigit) = CUInt(iTemp And DigitValue.Full) : wCarryLeft = CInt(iTemp >> DigitSize.Bits)
            iTemp = wCarryRight + CLng(wDigitLeftCur) * wLeftInvCur - CLng(wDigitRightCur) * wRightInvCur
            awRightNew(ndxDigit) = CUInt(iTemp And DigitValue.Full) : wCarryRight = CInt(iTemp >> DigitSize.Bits)
        Next
        If fRightIsShorter Then
            wDigitLeftCur = awLeftCur(ndxDigitMax)
            iTemp = wCarryLeft - CLng(wDigitLeftCur) * wLeftInvPrev
            awLeftNew(ndxDigitMax) = CUInt(iTemp And DigitValue.Full)
        End If

    Else

        For ndxDigit As Integer = 0 To ndxDigitStop
            wDigitLeftCur = awLeftCur(ndxDigit) : wDigitRightCur = awRightCur(ndxDigit)
            iTemp = wCarryLeft + CLng(wDigitLeftCur) * wLeftInvPrev - CLng(wDigitRightCur) * wRightInvPrev
            awLeftNew(ndxDigit) = CUInt(iTemp And DigitValue.Full) : wCarryLeft = CInt(iTemp >> DigitSize.Bits)
            iTemp = wCarryRight + CLng(wDigitRightCur) * wRightInvCur - CLng(wDigitLeftCur) * wLeftInvCur
            awRightNew(ndxDigit) = CUInt(iTemp And DigitValue.Full) : wCarryRight = CInt(iTemp >> DigitSize.Bits)
        Next
        If fRightIsShorter Then
            wDigitLeftCur = awLeftCur(ndxDigitMax)
            iTemp = wCarryLeft + CLng(wDigitLeftCur) * wLeftInvPrev
            awLeftNew(ndxDigitMax) = CUInt(iTemp And DigitValue.Full)
        End If

    End If

    uLeftCur = New BigUInteger(awLeftNew) : uRightCur = New BigUInteger(awRightNew)

End Sub

''' <remarks>All word-sized parameters must have their most-significant bit unset.</remarks>
Private Shared Sub ComputeFusedMulAdd(ByRef uLeftInvPrev As BigUInteger, ByRef uLeftInvCur As BigUInteger, ByVal wQuotient As UInt32)

    Dim ndxDigitPrevMax As Integer = uLeftInvPrev.ValueLength - 1, ndxDigitCurMax As Integer = uLeftInvCur.ValueLength - 1,
        ndxDigitNewMax As Integer = ndxDigitCurMax + 1
    Dim awLeftInvPrev() As UInt32 = uLeftInvPrev.ValueDigits, awLeftInvCur() As UInt32 = uLeftInvCur.ValueDigits,
        awLeftInvNew(0 To ndxDigitNewMax) As UInt32
    Dim dwResult As UInt64 = DigitValue.Zero, wCarry As UInt32 = DigitValue.Zero

    For ndxDigit As Integer = 0 To ndxDigitPrevMax
        dwResult = CULng(wCarry) + awLeftInvPrev(ndxDigit) + CULng(wQuotient) * awLeftInvCur(ndxDigit)
        awLeftInvNew(ndxDigit) = CUInt(dwResult And DigitValue.Full) : wCarry = CUInt(dwResult >> DigitSize.Bits)
    Next

    For ndxDigit As Integer = ndxDigitPrevMax + 1 To ndxDigitCurMax
        dwResult = CULng(wCarry) + CULng(wQuotient) * awLeftInvCur(ndxDigit)
        awLeftInvNew(ndxDigit) = CUInt(dwResult And DigitValue.Full) : wCarry = CUInt(dwResult >> DigitSize.Bits)
    Next

    If wCarry <> DigitValue.Zero Then awLeftInvNew(ndxDigitNewMax) = wCarry

    uLeftInvPrev = uLeftInvCur : uLeftInvCur = New BigUInteger(awLeftInvNew)

End Sub

If you want to use this code directly, you may need Visual Basic 2012 compiler for some constructs — I didn't check on previous versions; nor am I aware of minimum .Net version (at least 3.5 should suffice); compiled applications are known to run on Mono, although with inferior performance. The only thing I'm absolutely sure about is that one shouldn't try to use automatic VB-to-C# translators, as they are terribly bad in subjects like this; rely on your own head only.