I want to convert large positive BigInteger
s (up to many thousands of digits) to strings in Kotlin, using number bases up to 100 (and possibly higher).
For bases ≤36, there is of course toString(base)
, but for larger bases, I wrote this extension function:
fun BigInteger.toStringX(base: Int): String {
if (base <= 36) return toString(base) // improve speed if base <= 36
val bigBase = base.toBigInteger()
var n = this
val stringBuilder = StringBuilder()
while (n > ZERO) {
val div = n.divideAndRemainder(bigBase)
stringBuilder.append(DIGITS[div[1].toInt()])
n = div[0]
}
return stringBuilder.reverse().toString()
}
where DIGITS
is a string containing the list of digits.
Now the native toString
is faster by about an order of magnitude than my function – e.g. 60 ms for about 10,000 digits vs. 500 ms. Why is my function so slow? Any help improving on its speed (while maintinaing the ability to convert to bases > 36) would be appreciated.
(By the way, replacing append()
with insert()
and losing reverse()
in the last line doesn't change much.)
Looking at the source code for the built-in toString
, it seems to call this private toString
, which implements a divide-and-conquer algorithm.
/**
* Converts the specified BigInteger to a string and appends to
* {@code sb}. This implements the recursive Schoenhage algorithm
* for base conversions. This method can only be called for non-negative
* numbers.
* <p>
* See Knuth, Donald, _The Art of Computer Programming_, Vol. 2,
* Answers to Exercises (4.4) Question 14.
*
* @param u The number to convert to a string.
* @param sb The StringBuilder that will be appended to in place.
* @param radix The base to convert to.
* @param digits The minimum number of digits to pad to.
*/
private static void toString(BigInteger u, StringBuilder sb,
int radix, int digits) {
...
results = u.divideAndRemainder(v);
int expectedDigits = 1 << n;
// Now recursively build the two halves of each number.
toString(results[0], sb, radix, digits - expectedDigits);
toString(results[1], sb, radix, expectedDigits);
}
This means that there is only O(log(N)) divisions, for an N-bit number. Compare this to your algorithm, which does O(N) divisions.
So for large numbers, you can implement this algorithm too - "split" the number up into smaller ones, then use your O(N) algorithm when they are small enough, all the while passing the string builder along, so the digits are appended.