javastringalgorithmbigintegerbase36

Encode String to Base36


Currently I am working at an algorithm to encode a normal string with each possible character to a Base36 string.

I have tried the following but it doesn't work.

public static String encode(String str) {
    return new BigInteger(str, 16).toString(36);
}

I guess it's because the string is not just a hex string. If I use the string "Hello22334!" In Base36, then I get a NumberFormatException.

My approach would be to convert each character to a number. Convert the numbers to the hexadecimal representation, and then convert the hexstring to Base36.

Is my approach okay or is there a simpler or better way?


Solution

  • First you need to convert your string to a number, represented by a set of bytes. Which is what you use an encoding for. I highly recommend UTF-8.

    Then you need to convert that number, set of bytes to a string, in base 36.

    byte[] bytes = string.getBytes(StandardCharsets.UTF_8); 
    String base36 = new BigInteger(1, bytes).toString(36);
    

    To decode:

    byte[] bytes = new Biginteger(base36, 36).toByteArray();
    // Thanks to @Alok for pointing out the need to remove leading zeroes.
    int zeroPrefixLength = zeroPrefixLength(bytes);
    String string = new String(bytes, zeroPrefixLength, bytes.length-zeroPrefixLength, StandardCharsets.UTF_8));
    

    private int zeroPrefixLength(final byte[] bytes) {
        for (int i = 0; i < bytes.length; i++) {
            if (bytes[i] != 0) {
                return i;
            }
        }
        return bytes.length;
    }