kotliniobyte

How to convert a signed Long in Kotlin to a ByteArray while preserving sort order?


I am trying to convert a regular (signed) Long to a ByteArray in Kotlin, such that the lexicographic order of the resulting ByteArrays is the same as the order of the input Long values.

Consider this test case:

class LongToByteArrayTest {

   @Test
   fun canConvertLongToByteArrayPreservingOrder(){
        // some long values to test with
        val longs = mutableListOf(
            Long.MIN_VALUE, 
            Long.MAX_VALUE, 
            0, -1, +1, -2, +2, -10, +10, -42, +42, 
            Long.MAX_VALUE / 2, Long.MIN_VALUE / 2
        )
        longs.sort()

        for ((lower, upper) in longs.windowed(size = 2)) {
            println("Lower: ${lower}, Upper: ${upper}")
            assert(lower < upper) { "Fail: long values are out of order!" }

            val lowerBytes = toStableBytes(lower)
            val upperBytes = toStableBytes(upper)

            assert(compareBytesLex(lowerBytes, upperBytes) < 0){ "Fail: byte arrays are out of order!" }
        }
   }


   fun compareBytesLex(byteArray1: ByteArray, byteArray2: ByteArray): Int {
        val min = min(byteArray1.size, byteArray2.size)

        for (i in 0 until min) {
            val myByteUnsigned = byteArray1[i].toInt() and 0xff
            val otherByteUnsigned = byteArray2[i].toInt() and 0xff
            val cmp = myByteUnsigned - otherByteUnsigned
            if (cmp != 0) {
                return cmp
            }
        }

        return byteArray1.size - byteArray2.size
   }

   fun toStableBytes(value: Long) : ByteArray {
       TODO("how do we do this?")
   }

}

Essentially, I'm looking for a method body for toStableBytes which makes the test pass. Ideally, the byte array for a single long should be exactly 8 bytes in length.

I tried ByteBuffer#putLong(...), DataOutputStream#writeLong(...) and some other solutions provided by the JDK, but unfortunately none of them provides order-preserving byte arrays. I also tried manual little-endian-encoding, but this also doesn't have the desired property.


Solution

  • Ok, courtesy of Exodus LongBinding I've found a solution which seems to work:

        fun writeLong(out: OutputStream, long: Long) {
            writeUnsignedLong(out, long xor Long.MIN_VALUE)
        }
    
        private fun writeUnsignedLong(out: OutputStream, long: Long) {
            out.write((long ushr 56).toByte().toInt())
            out.write((long ushr 48).toByte().toInt())
            out.write((long ushr 40).toByte().toInt())
            out.write((long ushr 32).toByte().toInt())
            out.write((long ushr 24).toByte().toInt())
            out.write((long ushr 16).toByte().toInt())
            out.write((long ushr 8).toByte().toInt())
            out.write(long.toByte().toInt())
        }
    
        fun readLong(stream: InputStream): Long {
            return readUnsignedLong(stream) xor Long.MIN_VALUE
        }
    
        private fun readUnsignedLong(stream: InputStream): Long {
            val c1 = stream.read().toLong()
            val c2 = stream.read().toLong()
            val c3 = stream.read().toLong()
            val c4 = stream.read().toLong()
            val c5 = stream.read().toLong()
            val c6 = stream.read().toLong()
            val c7 = stream.read().toLong()
            val c8 = stream.read().toLong()
            if ((c1 or c2 or c3 or c4 or c5 or c6 or c7 or c8) < 0) {
                throw IndexOutOfBoundsException()
            }
            return (c1 shl 56) or (c2 shl 48) or (c3 shl 40) or (c4 shl 32) or
                (c5 shl 24) or (c6 shl 16) or (c7 shl 8) or c8
        }
    

    Basically what this code is doing is to flip the first bit in the Long (by doing an xor with Long.MIN_VALUE) and then simply writing those bits, one byte at a time, from highest to lowest. Flipping the first bit ensures that negative values (in binary representation) are lexicographically smaller than positive values. It's actually a relatively simple trick, but very neat. The read process is then basically just the inverse. Using a ByteArrayOutputStream and this algorithm, the test case in the opening post passes.