encodingcharacter-encodingbase62

Unable to Encode Base62 by Hand


I'm trying to Base62 encode the string Hi by hand. My process is as follows:

Convert each character of "cat" into its respective ascii value:

H -> 99  -> 01001000
i -> 105 -> 01101001

reorganized into 6 bit segments:

010010 000110 1001

x for padding:

010010 000110 1001xx

Reference the Base62 index table to convert to Base62:

010010 -> I
000110 -> 6
1001xx -> a
xxxxxx -> 0

This gives me I6a0, which I know is wrong because online encoders instead will give me 4oz. What am I doing wrong here? What is the correct way to do this?


Solution

  • Mechanically splitting the bit string into six-bit chunks is correct for base64 because it is equivalent to division by 64; but you need division by 62.

    In other words, what would happen if the six bits were 111111? You could not handle that with your logic, because there are only 62 symbols to choose from, and yet, this would need the 64th symbol.

    In still other words, the sequence of bytes can be interpreted as a big integer (though not so very big in your case, when there are only two input bytes). Starting from n=2 (because the entire number is less than 623), divide it by 62n; the integer quotient produces the first digit (you can look it up in the table on Wikipedia you already linked to) and repeat with the remainder with n-=1, like any base-x conversion (where x = 62 in this case) until you reach n=0.

    The following calculation uses simple integer division, like you probably learned it in school before you learned about fractions. For example, 11/4 would yield the integer result 2 and a remainder of 3 (because 11 - 4*2 = 3) rather than the fractional result 2.75.

    Your input converted to a single integer is 18537 (hex 0x4869).

    Dividing that by 3844 (622) yields 4, leaving 3161 in the remainder.

    (4, conveniently, is also the symbol for 4 in base62.)

    3161 divided by 62 is 50, leaving 61 as the remainder; symbol 50 in the table is o, and symbol 61 is z.