The following quote is from the reference manual:
We can specify any byte in a short literal string by its numeric value (including embedded zeros). This can be done (...) with the escape sequence \ddd, where ddd is a sequence of up to three decimal digits.
I don't understand the rationale behind this decision. Why use decimal if most other languages (e.g. Perl, Python, C, Java, JS) use octal?
In all likelihood, this is simply to stay consistent with the language itself, which only supports decimal and hexadecimal numeric constants. The omission from your quote describes hexadecimal escape sequences, and is followed by a description of UTF-8 escape sequences:
We can specify any byte in a short literal string, including embedded zeros, by its numeric value. This can be done with the escape sequence \xXX, where XX is a sequence of exactly two hexadecimal digits, or with the escape sequence \ddd, where ddd is a sequence of up to three decimal digits. (Note that if a decimal escape sequence is to be followed by a digit, it must be expressed using exactly three digits.)
The UTF-8 encoding of a Unicode character can be inserted in a literal string with the escape sequence \u{XXX} (with mandatory enclosing braces), where XXX is a sequence of one or more hexadecimal digits representing the character code point. This code point can be any value less than 231. (Lua uses the original UTF-8 specification here, which is not restricted to valid Unicode code points.)
This is a bit of an ouroboric rationale, as an easy follow up question is, well, Why does the language only support decimal and hexadecimal numeric constants? Possibly just for simplicity's sake, in general use, and in the parser - a chief goal of Lua's design is to be lightweight, after all. Or is it because the language doesn't support octal escape sequences? Uh oh.
I am not sure if there is a definitive rationale to be found without contacting the authors.
(The only mention of octal anywhere in the manual is in an example of pattern matching octal digits (Lua Patterns; an ironic example?).)
Besides that, octal is somewhat superfluous when describing a single byte (of 8 bits; an octet), which has a maximum octal value of 377
. Octal 400
(and greater) would be out of range. 9 bits is an awkward amount of bits when your strings are 8-bit clean:
The type string represents immutable sequences of bytes. Lua is 8-bit clean: strings can contain any 8-bit value, including embedded zeros ('\0'). Lua is also encoding-agnostic; it makes no assumptions about the contents of a string. The length of any string in Lua must fit in a Lua integer.
This little out-of-range fact proves itself a design challenge for all the of languages you mention:
C compilers issue diagnostics for character constants and string literals, when the value of an octal escape sequence exceeds the width of the underlying character type.
Python currently (3.13.3) issues a warning for an octal escape sequence value greater than 0o377
, with plans to elevate this to an error in the future.
JavaScript has deprecated the use of octal escape sequences in strings (and regular expressions).
Perl interprets the character in different character sets based on its value.
This Q&A describes the specific limitations of octal escape sequences in Java far better than I can (and links through to the Wikipedia article on Octal, In computers which highlights octal's relative obscurity in a modern world of machine words not divisible by three). It also frames the inclusion of octal escape sequences as just another bit of C's legacy (which is probably, at least partially, true for most languages derived from C; but curious that Lua, being so tightly coupled to C, breaks free of this).
With all that said, of course decimal 256
(and greater) is also out of range of a single byte, and Lua (un)happily complains of this as well
> "\256"
stdin:1: decimal escape too large near '"\256"'
so that kind of throws a wrench in the octal-range-argument-machine.
Only hexadecimal remains undefeated for neatly describing a single byte. Embrace the nibble.
More to the point, though, is that if you only need to represent 256 values, the decimal values 0
to 255
work rather well.