lua

Why Lua uses decimal in a '\ddd' escape sequence instead of octal?


The following quote is from the reference manual:

We can specify any byte in a short literal string by its numeric value (including embedded zeros). This can be done (...) with the escape sequence \ddd, where ddd is a sequence of up to three decimal digits.

I don't understand the rationale behind this decision. Why use decimal if most other languages (e.g. Perl, Python, C, Java, JS) use octal?


Solution

  • In all likelihood, this is simply to stay consistent with the language itself, which only supports decimal and hexadecimal numeric constants. The omission from your quote describes hexadecimal escape sequences, and is followed by a description of UTF-8 escape sequences:

    We can specify any byte in a short literal string, including embedded zeros, by its numeric value. This can be done with the escape sequence \xXX, where XX is a sequence of exactly two hexadecimal digits, or with the escape sequence \ddd, where ddd is a sequence of up to three decimal digits. (Note that if a decimal escape sequence is to be followed by a digit, it must be expressed using exactly three digits.)
    The UTF-8 encoding of a Unicode character can be inserted in a literal string with the escape sequence \u{XXX} (with mandatory enclosing braces), where XXX is a sequence of one or more hexadecimal digits representing the character code point. This code point can be any value less than 231. (Lua uses the original UTF-8 specification here, which is not restricted to valid Unicode code points.)

    This is a bit of an ouroboric rationale, as an easy follow up question is, well, Why does the language only support decimal and hexadecimal numeric constants? Possibly just for simplicity's sake, in general use, and in the parser - a chief goal of Lua's design is to be lightweight, after all. Or is it because the language doesn't support octal escape sequences? Uh oh.

    I am not sure if there is a definitive rationale to be found without contacting the authors.

    (The only mention of octal anywhere in the manual is in an example of pattern matching octal digits (Lua Patterns; an ironic example?).)


    Besides that, octal is somewhat superfluous when describing a single byte (of 8 bits; an octet), which has a maximum octal value of 377. Octal 400 (and greater) would be out of range. 9 bits is an awkward amount of bits when your strings are 8-bit clean:

    The type string represents immutable sequences of bytes. Lua is 8-bit clean: strings can contain any 8-bit value, including embedded zeros ('\0'). Lua is also encoding-agnostic; it makes no assumptions about the contents of a string. The length of any string in Lua must fit in a Lua integer.

    This little out-of-range fact proves itself a design challenge for all the of languages you mention:

    With all that said, of course decimal 256 (and greater) is also out of range of a single byte, and Lua (un)happily complains of this as well

    > "\256"
    stdin:1: decimal escape too large near '"\256"'
    

    so that kind of throws a wrench in the octal-range-argument-machine.

    Only hexadecimal remains undefeated for neatly describing a single byte. Embrace the nibble.

    More to the point, though, is that if you only need to represent 256 values, the decimal values 0 to 255 work rather well.