algorithmdiscrete-mathematicsgreatest-common-divisorcomputability

Lehmer's extended GCD algorithm implementation


While doing my own BigInteger implementation, I got stuck with the extended GCD algorithm, which is fundamental for finding modular multiplicative inverse. As the well-known Euclidean approach performs too slow, with hybrid and binary algorithms only 5­-10 times faster, the choice was for the Lehmer's modification to the classic algorithm. But the difficulty is that, when it comes to describing the Lehmer's, all books that I found (Knuth, Handbook of Applied Cryptography, Internets, etc) have the same shortcomings:

  1. Explanation is based on several tricks:
    • the input numbers are always of the same length;
    • the abstract CPU has signed registers, which can hold both the digit and the sign;
    • the abstract CPU has semi-unlimited registers, i. e. it never overflows.
  2. Only the basic GCD algorithm is provided, without focusing on the inverse cofactors.

As for the first problem, I was initially surprised by being unable to find any real-world implementation (don't point me to the GNU MP library — it's not a source to learn from), but finally took inspiration by decompiling the Microsoft's implementation from .Net 4.0, which is obviously based on the ideas from the paper “A double-digit Lehmer-Euclid algorithm for finding the GCD of long integers” by Jebelean. The resulting function is large, it looks scary, but works just great.

But Microsoft's library provides the basic functionality only, no cofactors are computed. Well, to be precise, some cofactors are computed during the shorthand step, and during the very first step those cofactors simply are the initial ones, but after the longhand step is performed then they do not match anymore. My current solution is to update the “real” cofactors in parallel with the “substitute” ones (except the very first step), but it makes the performance to drop below zero: the function now completes only 25­-50 % faster than the binary method in basic mode. So, the problem is that, while the input numbers are fully updated during longhand steps only, the cofactors are updated on each shorthand step's iteration as well, thus destroying almost any benefit from Lehmer's approach.

To speed up things a little, I implemented a “fused multiply-add” function, because a “fused multiply-multiply-subtract” really does help updating the input numbers, — but this time the impact was negligible. Another improvement is based on the fact that usually only one cofactor is necessary, so the other one can be just not computed at all. This should halve the overhead (or even more so, since the second number is usually significantly smaller than the first one), however in practice the overhead reduces only by 25 to 50 % of expected.

Consequently, my questions come down to this:

  1. Is there any full-scale explanation of Lehmer's algorithm, tied to practical implementation on real-world hardware (with unsigned words of limited size)?
  2. Same as above, but regarding the extended GCD computation.

So, as much as I'm happy with the performance of basic algorithm, the opposite applies to the extended mode of operation, which is the primary in my case.


Solution

  • Finally, I consulted a mathematician and he quickly figured out the right formulae — very similar to those I was trying myself, bit still slightly different. This allowed to update the cofactors on longhand steps only, at the same time as the input numbers are fully updated.

    However, to my big surprise, this measure alone had minor impact on the performance. Only when I re-implemented it as a “fused (A×X + B×Y)”, the speed improvement became noticeable. When computing both cofactors, it now runs at 56 % for 5-digit numbers and 34 % for 32K-digits, as compared to pure Lehmer GCD algorithm; for a single cofactor, the rate is 70 % and 52 % respectively. With previous implementations, it was merely 33% to 7% for both cofactors and 47 % to 14 % for single cofactor, so my dissatisfaction was obvious.

    As for writing a paper as andy256 recommended so that other implementers would not have the same trouble, it won't be easy. I already wrote a “small” paper when explaining my current implementation to the mathematician, and it quite rapidly exceeded three A4-sized sheets — while containing the basic ideas only, without detailed explanations, overflow checks, branching, unrolling, etc. Now I partially understand why Knuth and others have used dirty tricks to keep the story short. Currently, I have no idea how to achieve the same level of simplicity while not loosing thoroughness; perhaps, it would require several big flowcharts combined with commentaries.


    Update. It looks like I won't be able to complete and publish the library in near future (still have no luck in understanding Newton—Raphson division and Montgomery reduction), so I'll simply post the resulting function for those who are interested.

    The code doesn't include obvious functions like ComputeGCD_Euclid and general-purpose routines like ComputeDivisionLonghand. The code also lacks any comments (with few exceptions) — you should be already familiar with Lehmer's idea in general and double-digit shorthand technique mentioned above, if you want to understand it and integrate into your own library.

    An overview of number representation in my library. Digits are 32-bit unsigned integers, so that 64-bit unsigned and signed arithmetic may be used when needed. Digits are stored in a plain array (ValueDigits) from least to most significant (LSB), the actual size is stored explicitly (ValueLength), i. e. functions try to predict result size, but don't optimize memory consumption after computation. Objects are of value type (struct in .Net), but they reference digit arrays; therefore, objects are invariant, i. e. a = a + 1 creates a new object instead of altering an existing one.

    Public Shared Function ComputeGCD(ByVal uLeft As BigUInteger, ByVal uRight As BigUInteger,
            ByRef uLeftInverse As BigUInteger, ByRef uRightInverse As BigUInteger, ByVal fComputeLeftInverse As Boolean, ByVal fComputeRightInverse As Boolean) As BigUInteger
    
        Dim fSwap As Boolean = False
        Select Case uLeft.CompareTo(uRight)
            Case 0
                uLeftInverse = Instance.Zero : uRightInverse = Instance.One : Return uRight
            Case Is < 0
                fSwap = fComputeLeftInverse : fComputeLeftInverse = fComputeRightInverse : fComputeRightInverse = fSwap
                fSwap = True : Swap(uLeft, uRight)
        End Select
    
        Dim uResult As BigUInteger
        If (uLeft.ValueLength = 1) AndAlso (uRight.ValueLength = 1) Then
            Dim wLeftInverse As UInt32, wRightInverse As UInt32
            uResult = ComputeGCD_Euclid(uLeft.DigitLowest, uRight.DigitLowest, wLeftInverse, wRightInverse)
            uLeftInverse = wLeftInverse : uRightInverse = wRightInverse
        ElseIf uLeft.ValueLength <= 2 Then
            uResult = ComputeGCD_Euclid(uLeft, uRight, uLeftInverse, uRightInverse)
        Else
            uResult = ComputeGCD_Lehmer(uLeft, uRight, uLeftInverse, uRightInverse, fComputeLeftInverse, fComputeRightInverse)
        End If
    
        If fSwap Then Swap(uLeftInverse, uRightInverse)
    
        Return uResult
    End Function
    
    Private Shared Function ComputeGCD_Lehmer(ByVal uLeft As BigUInteger, ByVal uRight As BigUInteger,
            ByRef uLeftInverse As BigUInteger, ByRef uRightInverse As BigUInteger, ByVal fComputeLeftInverse As Boolean, ByVal fComputeRightInverse As Boolean) As BigUInteger
    
    
        Dim uLeftCur As BigUInteger = uLeft, uRightCur As BigUInteger = uRight
        Dim uLeftInvPrev As BigUInteger = Instance.One, uRightInvPrev As BigUInteger = Instance.Zero,
            uLeftInvCur As BigUInteger = uRightInvPrev, uRightInvCur As BigUInteger = uLeftInvPrev,
            fInvInit As Boolean = False, fIterationIsEven As Boolean = True
    
        Dim dwLeftCur, dwRightCur As UInt64
        Dim wLeftInvPrev, wRightInvPrev, wLeftInvCur, wRightInvCur As UInt32
        Dim dwNumeratorMore, dwNumeratorLess, dwDenominatorMore, dwDenominatorLess, dwQuotientMore, dwQuotientLess As UInt64,
            wQuotient As UInt32
        Const nSubtractionThresholdBits As Byte = (5 - 1)
    
        Dim ndxDigitMax As Integer, fRightIsShorter As Boolean
    
        Dim fResultFound As Boolean = False
        Dim uRemainder As BigUInteger = uRightCur, uQuotient As BigUInteger
        Dim uTemp As BigUInteger = Nothing, dwTemp, dwTemp2 As UInt64
    
        Do While uLeftCur.ValueLength > 2
    
            ndxDigitMax = uLeftCur.ValueLength - 1 : fRightIsShorter = (uRightCur.ValueLength < uLeftCur.ValueLength)
    
            Dim fShorthandStep As Boolean = True, fShorthandIterationIsEven As Boolean
            If fRightIsShorter AndAlso (uLeftCur.ValueLength - uRightCur.ValueLength > 1) Then fShorthandStep = False
    
            If fShorthandStep Then
    
                dwLeftCur = uLeftCur.ValueDigits(ndxDigitMax - 1) Or (CULng(uLeftCur.ValueDigits(ndxDigitMax)) << DigitSize.Bits)
                dwRightCur = uRightCur.ValueDigits(ndxDigitMax - 1) Or If(fRightIsShorter, DigitValue.Zero, CULng(uRightCur.ValueDigits(ndxDigitMax)) << DigitSize.Bits)
                If ndxDigitMax >= 2 Then
                    Dim nNormHead As Byte = GetNormalizationHead(uLeftCur.ValueDigits(ndxDigitMax))
                    If nNormHead <> ByteValue.Zero Then
                        dwLeftCur = (dwLeftCur << nNormHead) Or (uLeftCur.ValueDigits(ndxDigitMax - 2) >> (DigitSize.Bits - nNormHead))
                        dwRightCur = (dwRightCur << nNormHead) Or (uRightCur.ValueDigits(ndxDigitMax - 2) >> (DigitSize.Bits - nNormHead))
                    End If
                End If
    
                If CUInt(dwRightCur >> DigitSize.Bits) = DigitValue.Zero Then fShorthandStep = False
    
            End If
    
            If fShorthandStep Then
    
                ' First iteration, where overflow may occur in general formulae.
    
                If dwLeftCur = dwRightCur Then
                    fShorthandStep = False
                Else
                    If dwLeftCur = DoubleValue.Full Then dwLeftCur >>= 1 : dwRightCur >>= 1
                    dwDenominatorMore = dwRightCur : dwDenominatorLess = dwRightCur + DigitValue.One
                    dwNumeratorMore = dwLeftCur + DigitValue.One : dwNumeratorLess = dwLeftCur
    
                    If (dwNumeratorMore >> nSubtractionThresholdBits) <= dwDenominatorMore Then
                        wQuotient = DigitValue.Zero
                        Do
                            wQuotient += DigitValue.One : dwNumeratorMore -= dwDenominatorMore
                        Loop While dwNumeratorMore >= dwDenominatorMore
                        dwQuotientMore = wQuotient
                    Else
                        dwQuotientMore = dwNumeratorMore \ dwDenominatorMore
                        If dwQuotientMore >= DigitValue.BitHi Then fShorthandStep = False
                        wQuotient = CUInt(dwQuotientMore)
                    End If
    
                    If fShorthandStep Then
                        If (dwNumeratorLess >> nSubtractionThresholdBits) <= dwDenominatorLess Then
                            wQuotient = DigitValue.Zero
                            Do
                                wQuotient += DigitValue.One : dwNumeratorLess -= dwDenominatorLess
                            Loop While dwNumeratorLess >= dwDenominatorLess
                            dwQuotientLess = wQuotient
                        Else
                            dwQuotientLess = dwNumeratorLess \ dwDenominatorLess
                        End If
                        If dwQuotientMore <> dwQuotientLess Then fShorthandStep = False
                    End If
    
                End If
    
            End If
    
            If fShorthandStep Then
    
                ' Prepare for the second iteration.
                wLeftInvPrev = DigitValue.Zero : wLeftInvCur = DigitValue.One
                wRightInvPrev = DigitValue.One : wRightInvCur = wQuotient
                dwTemp = dwLeftCur - wQuotient * dwRightCur : dwLeftCur = dwRightCur : dwRightCur = dwTemp
                fShorthandIterationIsEven = True
    
                fIterationIsEven = Not fIterationIsEven
    
                ' Other iterations, no overflow possible(?).
                Do
    
                    If fShorthandIterationIsEven Then
                        If dwRightCur = wRightInvCur Then Exit Do
                        dwDenominatorMore = dwRightCur - wRightInvCur : dwDenominatorLess = dwRightCur + wLeftInvCur
                        dwNumeratorMore = dwLeftCur + wRightInvPrev : dwNumeratorLess = dwLeftCur - wLeftInvPrev
                    Else
                        If dwRightCur = wLeftInvCur Then Exit Do
                        dwDenominatorMore = dwRightCur - wLeftInvCur : dwDenominatorLess = dwRightCur + wRightInvCur
                        dwNumeratorMore = dwLeftCur + wLeftInvPrev : dwNumeratorLess = dwLeftCur - wRightInvPrev
                    End If
    
                    If (dwNumeratorMore >> nSubtractionThresholdBits) <= dwDenominatorMore Then
                        wQuotient = DigitValue.Zero
                        Do
                            wQuotient += DigitValue.One : dwNumeratorMore -= dwDenominatorMore
                        Loop While dwNumeratorMore >= dwDenominatorMore
                        dwQuotientMore = wQuotient
                    Else
                        dwQuotientMore = dwNumeratorMore \ dwDenominatorMore
                        If dwQuotientMore >= DigitValue.BitHi Then Exit Do
                        wQuotient = CUInt(dwQuotientMore)
                    End If
    
                    If (dwNumeratorLess >> nSubtractionThresholdBits) <= dwDenominatorLess Then
                        wQuotient = DigitValue.Zero
                        Do
                            wQuotient += DigitValue.One : dwNumeratorLess -= dwDenominatorLess
                        Loop While dwNumeratorLess >= dwDenominatorLess
                        dwQuotientLess = wQuotient
                    Else
                        dwQuotientLess = dwNumeratorLess \ dwDenominatorLess
                    End If
                    If dwQuotientMore <> dwQuotientLess Then Exit Do
    
                    dwTemp = wLeftInvPrev + wQuotient * wLeftInvCur : dwTemp2 = wRightInvPrev + wQuotient * wRightInvCur
                    If (dwTemp >= DigitValue.BitHi) OrElse (dwTemp2 >= DigitValue.BitHi) Then Exit Do
                    wLeftInvPrev = wLeftInvCur : wLeftInvCur = CUInt(dwTemp)
                    wRightInvPrev = wRightInvCur : wRightInvCur = CUInt(dwTemp2)
                    dwTemp = dwLeftCur - wQuotient * dwRightCur : dwLeftCur = dwRightCur : dwRightCur = dwTemp
                    fShorthandIterationIsEven = Not fShorthandIterationIsEven
    
                    fIterationIsEven = Not fIterationIsEven
    
                Loop
    
            End If
    
            If (Not fShorthandStep) OrElse (wRightInvPrev = DigitValue.Zero) Then
                ' Longhand step.
    
                uQuotient = ComputeDivisionLonghand(uLeftCur, uRightCur, uTemp) : If uTemp.IsZero Then fResultFound = True : Exit Do
                uRemainder = uTemp
    
                fIterationIsEven = Not fIterationIsEven
                If fComputeLeftInverse Then
                    uTemp = uLeftInvPrev + uQuotient * uLeftInvCur : uLeftInvPrev = uLeftInvCur : uLeftInvCur = uTemp
                End If
                If fComputeRightInverse Then
                    uTemp = uRightInvPrev + uQuotient * uRightInvCur : uRightInvPrev = uRightInvCur : uRightInvCur = uTemp
                End If
                fInvInit = True
    
                uLeftCur = uRightCur : uRightCur = uRemainder
    
            Else
                ' Shorthand step finalization.
    
                If Not fInvInit Then
                    If fComputeLeftInverse Then uLeftInvPrev = wLeftInvPrev : uLeftInvCur = wLeftInvCur
                    If fComputeRightInverse Then uRightInvPrev = wRightInvPrev : uRightInvCur = wRightInvCur
                    fInvInit = True
                Else
                    If fComputeLeftInverse Then ComputeFusedMulMulAdd(uLeftInvPrev, uLeftInvCur, wLeftInvPrev, wLeftInvCur, wRightInvPrev, wRightInvCur)
                    If fComputeRightInverse Then ComputeFusedMulMulAdd(uRightInvPrev, uRightInvCur, wLeftInvPrev, wLeftInvCur, wRightInvPrev, wRightInvCur)
                End If
    
                ComputeFusedMulMulSub(uLeftCur, uRightCur, wLeftInvPrev, wLeftInvCur, wRightInvPrev, wRightInvCur, fShorthandIterationIsEven)
    
            End If
    
        Loop
    
        ' Final rounds: numbers are quite short now.
        If Not fResultFound Then
    
            ndxDigitMax = uLeftCur.ValueLength - 1 : fRightIsShorter = (uRightCur.ValueLength < uLeftCur.ValueLength)
            If ndxDigitMax = 0 Then
                dwLeftCur = uLeftCur.ValueDigits(0)
                dwRightCur = uRightCur.ValueDigits(0)
            Else
                dwLeftCur = uLeftCur.ValueDigits(0) Or (CULng(uLeftCur.ValueDigits(1)) << DigitSize.Bits)
                dwRightCur = uRightCur.ValueDigits(0) Or If(fRightIsShorter, DigitValue.Zero, CULng(uRightCur.ValueDigits(1)) << DigitSize.Bits)
            End If
    
            Do While dwLeftCur >= DigitValue.BitHi
    
                Dim dwRemainder As UInt64 = dwLeftCur
    
                If (dwRemainder >> nSubtractionThresholdBits) <= dwRightCur Then
                    wQuotient = DigitValue.Zero
                    Do
                        wQuotient += DigitValue.One : dwRemainder -= dwRightCur
                    Loop While dwRemainder >= dwRightCur
                    dwQuotientMore = wQuotient
                Else
                    dwQuotientMore = dwLeftCur \ dwRightCur
                    dwRemainder = dwLeftCur - dwQuotientMore * dwRightCur
                End If
    
                If dwRemainder = DigitValue.Zero Then fResultFound = True : Exit Do
    
    
                fIterationIsEven = Not fIterationIsEven
                If dwQuotientMore < DigitValue.BitHi Then
                    wQuotient = CUInt(dwQuotientMore)
                    If fComputeLeftInverse Then ComputeFusedMulAdd(uLeftInvPrev, uLeftInvCur, wQuotient)
                    If fComputeRightInverse Then ComputeFusedMulAdd(uRightInvPrev, uRightInvCur, wQuotient)
                Else
                    If fComputeLeftInverse Then
                        uTemp = uLeftInvPrev + dwQuotientMore * uLeftInvCur : uLeftInvPrev = uLeftInvCur : uLeftInvCur = uTemp
                    End If
                    If fComputeRightInverse Then
                        uTemp = uRightInvPrev + dwQuotientMore * uRightInvCur : uRightInvPrev = uRightInvCur : uRightInvCur = uTemp
                    End If
                End If
    
                dwLeftCur = dwRightCur : dwRightCur = dwRemainder
    
            Loop
    
            If fResultFound Then
    
                uRightCur = dwRightCur
    
            Else
    
                ' Final rounds: both numbers have only one digit now, and this digit has MS-bit unset.
                Dim wLeftCur As UInt32 = CUInt(dwLeftCur), wRightCur As UInt32 = CUInt(dwRightCur)
    
                Do
    
                    Dim wRemainder As UInt32 = wLeftCur
    
                    If (wRemainder >> nSubtractionThresholdBits) <= wRightCur Then
                        wQuotient = DigitValue.Zero
                        Do
                            wQuotient += DigitValue.One : wRemainder -= wRightCur
                        Loop While wRemainder >= wRightCur
                    Else
                        wQuotient = wLeftCur \ wRightCur
                        wRemainder = wLeftCur - wQuotient * wRightCur
                    End If
    
                    If wRemainder = DigitValue.Zero Then fResultFound = True : Exit Do
    
                    fIterationIsEven = Not fIterationIsEven
                    If fComputeLeftInverse Then ComputeFusedMulAdd(uLeftInvPrev, uLeftInvCur, wQuotient)
                    If fComputeRightInverse Then ComputeFusedMulAdd(uRightInvPrev, uRightInvCur, wQuotient)
    
                    wLeftCur = wRightCur : wRightCur = wRemainder
    
                Loop
    
                uRightCur = wRightCur
    
            End If
    
    
        End If
    
        If fComputeLeftInverse Then
            uLeftInverse = If(fIterationIsEven, uRight - uLeftInvCur, uLeftInvCur)
        End If
        If fComputeRightInverse Then
            uRightInverse = If(fIterationIsEven, uRightInvCur, uLeft - uRightInvCur)
        End If
    
        Return uRightCur
    End Function
    
    ''' <remarks>All word-sized parameters must have their most-significant bit unset.</remarks>
    Private Shared Sub ComputeFusedMulMulAdd(
            ByRef uLeftInvPrev As BigUInteger, ByRef uLeftInvCur As BigUInteger,
            ByVal wLeftInvPrev As UInt32, ByVal wLeftInvCur As UInt32, ByVal wRightInvPrev As UInt32, ByVal wRightInvCur As UInt32)
    
        Dim ndxDigitMaxPrev As Integer = uLeftInvPrev.ValueLength - 1, ndxDigitMaxCur As Integer = uLeftInvCur.ValueLength - 1,
            ndxDigitMaxNew As Integer = ndxDigitMaxCur + 1
    
        Dim awLeftInvPrev() As UInt32 = uLeftInvPrev.ValueDigits, awLeftInvCur() As UInt32 = uLeftInvCur.ValueDigits
        Dim awLeftInvPrevNew(0 To ndxDigitMaxNew) As UInt32, awLeftInvCurNew(0 To ndxDigitMaxNew) As UInt32
        Dim dwResult As UInt64, wCarryLeftPrev As UInt32 = DigitValue.Zero, wCarryLeftCur As UInt32 = DigitValue.Zero
        Dim wDigitLeftInvPrev, wDigitLeftInvCur As UInt32
    
        For ndxDigit As Integer = 0 To ndxDigitMaxPrev
            wDigitLeftInvPrev = awLeftInvPrev(ndxDigit) : wDigitLeftInvCur = awLeftInvCur(ndxDigit)
    
            dwResult = wCarryLeftPrev + wLeftInvPrev * CULng(wDigitLeftInvPrev) + wRightInvPrev * CULng(wDigitLeftInvCur)
            awLeftInvPrevNew(ndxDigit) = CUInt(dwResult And DigitValue.Full) : wCarryLeftPrev = CUInt(dwResult >> DigitSize.Bits)
    
            dwResult = wCarryLeftCur + wLeftInvCur * CULng(wDigitLeftInvPrev) + wRightInvCur * CULng(wDigitLeftInvCur)
            awLeftInvCurNew(ndxDigit) = CUInt(dwResult And DigitValue.Full) : wCarryLeftCur = CUInt(dwResult >> DigitSize.Bits)
    
        Next
    
        If ndxDigitMaxCur > ndxDigitMaxPrev Then
    
            For ndxDigit As Integer = ndxDigitMaxPrev + 1 To ndxDigitMaxCur
                wDigitLeftInvCur = awLeftInvCur(ndxDigit)
    
                dwResult = wCarryLeftPrev + wRightInvPrev * CULng(wDigitLeftInvCur)
                awLeftInvPrevNew(ndxDigit) = CUInt(dwResult And DigitValue.Full) : wCarryLeftPrev = CUInt(dwResult >> DigitSize.Bits)
    
                dwResult = wCarryLeftCur + wRightInvCur * CULng(wDigitLeftInvCur)
                awLeftInvCurNew(ndxDigit) = CUInt(dwResult And DigitValue.Full) : wCarryLeftCur = CUInt(dwResult >> DigitSize.Bits)
    
            Next
    
        End If
    
        If wCarryLeftPrev <> DigitValue.Zero Then awLeftInvPrevNew(ndxDigitMaxNew) = wCarryLeftPrev
        If wCarryLeftCur <> DigitValue.Zero Then awLeftInvCurNew(ndxDigitMaxNew) = wCarryLeftCur
    
        uLeftInvPrev = New BigUInteger(awLeftInvPrevNew) : uLeftInvCur = New BigUInteger(awLeftInvCurNew)
    
    End Sub
    
    ''' <remarks>All word-sized parameters must have their most-significant bit unset.</remarks>
    Private Shared Sub ComputeFusedMulMulSub(
            ByRef uLeftCur As BigUInteger, ByRef uRightCur As BigUInteger,
            ByVal wLeftInvPrev As UInt32, ByVal wLeftInvCur As UInt32, ByVal wRightInvPrev As UInt32, ByVal wRightInvCur As UInt32,
            ByVal fShorthandIterationIsEven As Boolean)
    
        Dim ndxDigitMax As Integer = uLeftCur.ValueLength - 1,
            fRightIsShorter As Boolean = (uRightCur.ValueLength < uLeftCur.ValueLength),
            ndxDigitStop As Integer = If(fRightIsShorter, ndxDigitMax - 1, ndxDigitMax)
    
        Dim awLeftCur() As UInt32 = uLeftCur.ValueDigits, awRightCur() As UInt32 = uRightCur.ValueDigits
        Dim awLeftNew(0 To ndxDigitMax) As UInt32, awRightNew(0 To ndxDigitStop) As UInt32
        Dim iTemp As Int64, wCarryLeft As Int32 = 0I, wCarryRight As Int32 = 0I
        Dim wDigitLeftCur, wDigitRightCur As UInt32
    
        If fShorthandIterationIsEven Then
    
            For ndxDigit As Integer = 0 To ndxDigitStop
                wDigitLeftCur = awLeftCur(ndxDigit) : wDigitRightCur = awRightCur(ndxDigit)
                iTemp = wCarryLeft + CLng(wDigitRightCur) * wRightInvPrev - CLng(wDigitLeftCur) * wLeftInvPrev
                awLeftNew(ndxDigit) = CUInt(iTemp And DigitValue.Full) : wCarryLeft = CInt(iTemp >> DigitSize.Bits)
                iTemp = wCarryRight + CLng(wDigitLeftCur) * wLeftInvCur - CLng(wDigitRightCur) * wRightInvCur
                awRightNew(ndxDigit) = CUInt(iTemp And DigitValue.Full) : wCarryRight = CInt(iTemp >> DigitSize.Bits)
            Next
            If fRightIsShorter Then
                wDigitLeftCur = awLeftCur(ndxDigitMax)
                iTemp = wCarryLeft - CLng(wDigitLeftCur) * wLeftInvPrev
                awLeftNew(ndxDigitMax) = CUInt(iTemp And DigitValue.Full)
            End If
    
        Else
    
            For ndxDigit As Integer = 0 To ndxDigitStop
                wDigitLeftCur = awLeftCur(ndxDigit) : wDigitRightCur = awRightCur(ndxDigit)
                iTemp = wCarryLeft + CLng(wDigitLeftCur) * wLeftInvPrev - CLng(wDigitRightCur) * wRightInvPrev
                awLeftNew(ndxDigit) = CUInt(iTemp And DigitValue.Full) : wCarryLeft = CInt(iTemp >> DigitSize.Bits)
                iTemp = wCarryRight + CLng(wDigitRightCur) * wRightInvCur - CLng(wDigitLeftCur) * wLeftInvCur
                awRightNew(ndxDigit) = CUInt(iTemp And DigitValue.Full) : wCarryRight = CInt(iTemp >> DigitSize.Bits)
            Next
            If fRightIsShorter Then
                wDigitLeftCur = awLeftCur(ndxDigitMax)
                iTemp = wCarryLeft + CLng(wDigitLeftCur) * wLeftInvPrev
                awLeftNew(ndxDigitMax) = CUInt(iTemp And DigitValue.Full)
            End If
    
        End If
    
        uLeftCur = New BigUInteger(awLeftNew) : uRightCur = New BigUInteger(awRightNew)
    
    End Sub
    
    ''' <remarks>All word-sized parameters must have their most-significant bit unset.</remarks>
    Private Shared Sub ComputeFusedMulAdd(ByRef uLeftInvPrev As BigUInteger, ByRef uLeftInvCur As BigUInteger, ByVal wQuotient As UInt32)
    
        Dim ndxDigitPrevMax As Integer = uLeftInvPrev.ValueLength - 1, ndxDigitCurMax As Integer = uLeftInvCur.ValueLength - 1,
            ndxDigitNewMax As Integer = ndxDigitCurMax + 1
        Dim awLeftInvPrev() As UInt32 = uLeftInvPrev.ValueDigits, awLeftInvCur() As UInt32 = uLeftInvCur.ValueDigits,
            awLeftInvNew(0 To ndxDigitNewMax) As UInt32
        Dim dwResult As UInt64 = DigitValue.Zero, wCarry As UInt32 = DigitValue.Zero
    
        For ndxDigit As Integer = 0 To ndxDigitPrevMax
            dwResult = CULng(wCarry) + awLeftInvPrev(ndxDigit) + CULng(wQuotient) * awLeftInvCur(ndxDigit)
            awLeftInvNew(ndxDigit) = CUInt(dwResult And DigitValue.Full) : wCarry = CUInt(dwResult >> DigitSize.Bits)
        Next
    
        For ndxDigit As Integer = ndxDigitPrevMax + 1 To ndxDigitCurMax
            dwResult = CULng(wCarry) + CULng(wQuotient) * awLeftInvCur(ndxDigit)
            awLeftInvNew(ndxDigit) = CUInt(dwResult And DigitValue.Full) : wCarry = CUInt(dwResult >> DigitSize.Bits)
        Next
    
        If wCarry <> DigitValue.Zero Then awLeftInvNew(ndxDigitNewMax) = wCarry
    
        uLeftInvPrev = uLeftInvCur : uLeftInvCur = New BigUInteger(awLeftInvNew)
    
    End Sub
    

    If you want to use this code directly, you may need Visual Basic 2012 compiler for some constructs — I didn't check on previous versions; nor am I aware of minimum .Net version (at least 3.5 should suffice); compiled applications are known to run on Mono, although with inferior performance. The only thing I'm absolutely sure about is that one shouldn't try to use automatic VB-to-C# translators, as they are terribly bad in subjects like this; rely on your own head only.