I found some code online that I am trying to work through which encodes to base64. I know Python has base64.urlsafe_b64decode()
but I would like to learn a bit more about what is going on.
The JS atob
looks like:
function atob (input) {
var chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=';
var str = String(input).replace(/=+$/, '');
if (str.length % 4 == 1) {
throw new InvalidCharacterError("'atob' failed: The string to be decoded is not correctly encoded.");
}
for (
// initialize result and counters
var bc = 0, bs, buffer, idx = 0, output = '';
// get next character
buffer = str.charAt(idx++);
// character found in table? initialize bit storage and add its ascii value;
~buffer && (bs = bc % 4 ? bs * 64 + buffer : buffer,
// and if not first of each 4 characters,
// convert the first 8 bits to one ascii character
bc++ % 4) ? output += String.fromCharCode(255 & bs >> (-2 * bc & 6)) : 0
) {
// try to find character in table (0-63, not found => -1)
buffer = chars.indexOf(buffer);
}
return output;
}
My goal is to port this Python, but I am trying to understand what the for loop is doing in Javascript.
It checks if the value is located in the chars
table and then initializes some variables using a ternary like: bs = bc % 4 ? bs*64+buffer: buffer, bc++ %4
I am not quite sure I understand what the buffer, bc++ % 4
part of the ternary is doing. The comma confuses me a bit. Plus the String.fromCharCode(255 & (bs >> (-2 * bc & 6)))
is a bit esoteric to me.
I've been trying something like this in Python, which produces some results, albeit different than what the javascript implementation is doing
# Test subject
b64_str: str = "fwHzODWqgMH+NjBq02yeyQ=="
# Lookup table for characters
chars: str = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/="
# Replace right padding with empty string
replaced = re.sub("=+$", '', b64_str)
if len(replaced) % 4 == 1:
raise ValueError("atob failed. The string to be decoded is not valid base64")
# Bit storage and counters
bc = 0
out: str = ''
for i in replaced:
# Get ascii value of character
buffer = ord(i)
# If counter is evenly divisible by 4, return buffer as is, else add the ascii value
bs = bc * 64 + buffer if bc % 4 else buffer
bc += 1 % 4 # Not sure I understand this part
# Check if character is in the chars table
if i in chars:
# Check if the bit storage and bit counter are non-zero
if bs and bc:
# If so, convert the first 8 bits to an ascii character
out += chr(255 & bs >> (-2 * bc & 6))
else:
out = 0
# Set buffer to the index of where the first instance of the character is in the b64 string
print(f"before: {chr(buffer)}")
buffer = chars.index(chr(buffer))
print(f"after: {buffer}")
print(out)
JS gives ó85ªÁþ60jÓlÉ
Python gives 2:u1(²ë:ð1G>%Y
Here is a tested version https://www.online-python.com/PiseKNFuaO
import base64
class InvalidCharacterError(Exception):
pass
def atob(input_str):
chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/='
input_str = str(input_str).rstrip('=')
if len(input_str) % 4 == 1:
raise InvalidCharacterError("'atob' failed: The string to be decoded is not correctly encoded.")
output = []
bc = 0
bs = 0
buffer = 0
for char in input_str:
buffer = chars.find(char)
if buffer == -1:
raise InvalidCharacterError("'atob' failed: The string to be decoded contains an invalid character.")
bs = (bs << 6) + buffer
bc += 6
if bc >= 8:
bc -= 8
output.append(chr((bs >> bc) & 255))
return ''.join(output)
# Compare with Python's built-in Base64 decoding
def test_atob():
test_strings = [
"SGVsbG8gd29ybGQ=", # "Hello world"
"U29mdHdhcmUgRW5naW5lZXJpbmc=", # "Software Engineering"
"VGVzdGluZyAxMjM=", # "Testing 123"
"SGVsbG8gd29ybGQ==", # "Hello world" with extra padding
"SGVsbG8gd29ybGQ= ", # "Hello world" with trailing space (invalid)
"SGVsbG8gd29ybGQ\r\n", # "Hello world" with newline characters (invalid)
"Invalid!!==", # Invalid characters
"VGhpcyBpcyBhbiBlbmNvZGVkIHN0cmluZyE", # "This is an encoded string!" without padding
"U29tZVNwZWNpYWwgQ2hhcnM6ICsgLyA=", # "SomeSpecial Chars: + / " with padding
]
for encoded in test_strings:
try:
expected = base64.b64decode(encoded).decode('utf-8')
result = atob(encoded)
print(result == expected, "Custom:", result, "Expected:", expected)
except Exception as e:
print(f"Error for string: {encoded} - {e}")
test_atob()