segmentation-faultmalbolge

Tritwise rotate right and tritwise crazy operation throws a segmentation fault


The tritwise operations, rotating right and the crazy operation, fail to function correctly and throw a segmentation fault in the Malbolge compiler/interpreter.

I've decided to start programming in Malbolge after seeing amazing answers on Coding Challenges and Code Golf and also to learn to program in a hard programming language.

While I was attempting to output a fixed character, I noticed that the * and p (in Normalized Malbolge) were throwing segmentation faults most of the time I was attempting to use them.

I attempted to use the Internet and look up the string (on Google) "'Malbolge' crazy operation 'segfaults'" and "'Malbolge' rotate right 'segfaults'". I also tried using the commands in different contexts and found that it worked if there was no input (which is not what I want).

I am using an online interpreter hosted by tio.run or Try It Online.

The code I have attempted to use:

Normalized Malbolge: /*<

Malbolge: u&a

Try it online!

Normalized Malbolge: /p*<v

Malbolge: u=%`M

Try it online!

Normalized Malbolge: /pp<v

Malbolge: u=<`M

Try it online!

I expected the output of u&a, u=%`M, and u=<`M to be one that does not throw any error, but the actual output is a segmentation fault.

The exact error: /srv/wrappers/malbolge: line 3: 21992 Segmentation fault (core dumped) /opt/malbolge/malbolge .code.tio < .input.tio where 21992 can be any number (most likely in the thousands to ten thousands)


Solution

  • I've been stepping through the Malbolge interpreter in a debugger in order to diagnose what's going on here. I would say "congratulations, you've discovered a bug in the Malbolge interpreter", but given that the spec and interpreter don't match each other in other ways (with the authoritative version normally taken as being the interpreter), and that this is Malbolge, for all I know this is intended behaviour. (OK, it probably isn't actually intended behaviour, but then neither are a number of other features that have come to be treated as important programming techniques.)

    Malbolge stores everything in one big array, both code and data. Commands are intended to modify ("encrypt", in Malbolge terminology) themselves after running, but the interpreter didn't quite end up implementing that correctly: what it actually does is to run a command, then look at the address pointed to by the code pointer and encrypt that. That's why jumping will encrypt the instruction before the jump target, rather than the jump instruction itself.

    If the command that you're running is outside the range 33 to 126 inclusive, the command isn't run (actually, in the version of the Malbolge interpreter I have, the code and data pointers aren't increased either, which seems like it would inevitably lead to an infinite loop; perhaps there are other versions around which fix that issue). That's an important check because the encryption routine simply works by indexing into a lookup table; values outside the 33 to 126 range will end up reading some arbitrary byte of memory from before or after the array.

    Unfortunately, because the code and data are stored together in one big array, a command can end up modifying itself as it runs: it might have been in the 33 to 126 range before running (thus causing the safety check to succeed), but after running, it's out of range and then the encryption will end up doing an out-of-bounds index of the lookup table. The Malbolge interpreter is written in C, which has undefined behaviour for out-of-bounds reads, but for reads that are a very long way out of bounds, a segmentation fault is likely (but not guaranteed) behaviour.

    Let's look at what happens with the code u&a:

    Command   A     C     D   memory
      start   0     0     0   117,    38, 97, 29432, 98, 29431, 98, 29432, 97, ...
          / input   0     0   117,    38, 97, 29432, 98, 29431, 98, 29432, 97, ...
    encrypt input   1     1   111,    38, 97, 29432, 98, 29431, 98, 29432, 97, ...
          * 39378   1     1   111, 39378, 97, 29432, 98, 29431, 98, 29432, 97, ...
    encrypt 39378   2     2   111,   ???, 97, 29432, 98, 29431, 98, 29432, 97, ...
    

    As you can see, you aren't actually rotating the input you loaded into A; the rotate operation reads from the address pointed to by D (not from A), so you're rotating 38 (the in-memory representation of a * command in memory location 1) to produce 39378. That value gets stored into both A and memory location 1. Unfortunately, memory location 1 was the currently executing command, so when it comes to be time to encrypt it, the interpreter does an out-of-bounds read of the lookup table (trying to find the location corresponding to 39378 in a table that only covers the range from 33 to 126), which will produce a segmentation fault if you're lucky.

    This behaviour is highly likely in "simple" Malbolge programs because C and D start out with the same value and increase at the same rate. If you want a rotate instruction to affect something other than the currently running command, you'll have to desync them somehow. The easiest way to do that is normally a j command (note: it might not be particularly easy to make use of the resulting data pointer, but at least it'll probably be somewhere else than the code pointer).

    By the way, actually rotating the user input takes much more effort than in the example programs you've given. You'll have to store it into memory first, and the only operation capable of writing memory dependent on the value of A is p, which requires the cell in question to already have an appropriate value (to avoid losing information, this needs to be 29524, a value that you'll have to produce via Malbolge's arithmetic as it can't be entered as part of the original program, and even then you'll end up swapping 0 trits with 1 trits in the input value). Then you'll have to send the data pointer back to the cell you wrote the input to, so that you can run * on it to rotate it.