I am trying to convert a regular (signed) Long
to a ByteArray
in Kotlin, such that the lexicographic order of the resulting ByteArray
s is the same as the order of the input Long
values.
Consider this test case:
class LongToByteArrayTest {
@Test
fun canConvertLongToByteArrayPreservingOrder(){
// some long values to test with
val longs = mutableListOf(
Long.MIN_VALUE,
Long.MAX_VALUE,
0, -1, +1, -2, +2, -10, +10, -42, +42,
Long.MAX_VALUE / 2, Long.MIN_VALUE / 2
)
longs.sort()
for ((lower, upper) in longs.windowed(size = 2)) {
println("Lower: ${lower}, Upper: ${upper}")
assert(lower < upper) { "Fail: long values are out of order!" }
val lowerBytes = toStableBytes(lower)
val upperBytes = toStableBytes(upper)
assert(compareBytesLex(lowerBytes, upperBytes) < 0){ "Fail: byte arrays are out of order!" }
}
}
fun compareBytesLex(byteArray1: ByteArray, byteArray2: ByteArray): Int {
val min = min(byteArray1.size, byteArray2.size)
for (i in 0 until min) {
val myByteUnsigned = byteArray1[i].toInt() and 0xff
val otherByteUnsigned = byteArray2[i].toInt() and 0xff
val cmp = myByteUnsigned - otherByteUnsigned
if (cmp != 0) {
return cmp
}
}
return byteArray1.size - byteArray2.size
}
fun toStableBytes(value: Long) : ByteArray {
TODO("how do we do this?")
}
}
Essentially, I'm looking for a method body for toStableBytes
which makes the test pass. Ideally, the byte array for a single long should be exactly 8 bytes in length.
I tried ByteBuffer#putLong(...)
, DataOutputStream#writeLong(...)
and some other solutions provided by the JDK, but unfortunately none of them provides order-preserving byte arrays. I also tried manual little-endian-encoding, but this also doesn't have the desired property.
Ok, courtesy of Exodus LongBinding I've found a solution which seems to work:
fun writeLong(out: OutputStream, long: Long) {
writeUnsignedLong(out, long xor Long.MIN_VALUE)
}
private fun writeUnsignedLong(out: OutputStream, long: Long) {
out.write((long ushr 56).toByte().toInt())
out.write((long ushr 48).toByte().toInt())
out.write((long ushr 40).toByte().toInt())
out.write((long ushr 32).toByte().toInt())
out.write((long ushr 24).toByte().toInt())
out.write((long ushr 16).toByte().toInt())
out.write((long ushr 8).toByte().toInt())
out.write(long.toByte().toInt())
}
fun readLong(stream: InputStream): Long {
return readUnsignedLong(stream) xor Long.MIN_VALUE
}
private fun readUnsignedLong(stream: InputStream): Long {
val c1 = stream.read().toLong()
val c2 = stream.read().toLong()
val c3 = stream.read().toLong()
val c4 = stream.read().toLong()
val c5 = stream.read().toLong()
val c6 = stream.read().toLong()
val c7 = stream.read().toLong()
val c8 = stream.read().toLong()
if ((c1 or c2 or c3 or c4 or c5 or c6 or c7 or c8) < 0) {
throw IndexOutOfBoundsException()
}
return (c1 shl 56) or (c2 shl 48) or (c3 shl 40) or (c4 shl 32) or
(c5 shl 24) or (c6 shl 16) or (c7 shl 8) or c8
}
Basically what this code is doing is to flip the first bit in the Long
(by doing an xor
with Long.MIN_VALUE
) and then simply writing those bits, one byte at a time, from highest to lowest. Flipping the first bit ensures that negative values (in binary representation) are lexicographically smaller than positive values. It's actually a relatively simple trick, but very neat. The read process is then basically just the inverse. Using a ByteArrayOutputStream
and this algorithm, the test case in the opening post passes.