assemblycompiler-constructionnand2tetris

Would it be possible to write a HACK assembler in JACK?


I'm working on the nand2tetris course right now, and I'm currently writing the assembler in Lua. I know that later down the line we will be writing a JACK compiler.

After I'm done with the course, and I have a HACK computer with an OS and virtual machine, could I make a HACK assembler in JACK, and then compile that program into assembly, and then compile itself?
Is that form of boot-strapping possible, or am I flying too close to the sun?


Solution

  • It's not rare for assemblers or compilers to be self-hosting, i.e. able to compile themselves. And yes, you can start with hand-written machine code to bootstrap up to that point. See Were the first assemblers written in machine code? on softwareengineering.SE re: bootstrapping a toolchain.

    e.g. GCC and clang/LLVM are both self-hosting, i.e. they're written in C or C++, and the C-compiler portion of those projects can compile their own source code.

    And yes, they call it "bootstrapping" when you start with some random C compiler that can target the platform you want, and you use it to compile GCC. Then you use that GCC to make an optimized build of GCC for the same platform, so your compile times are lower. And so any GCC internals features or optimizations that require GNU C extensions can be enabled (if it builds at all with an ISO C compiler).

    Assuming JACK is anywhere near Turing complete, so you can write arbitrary programs in it that read text files and write binary files, there's no problem.


    Also related: How was the first assembler for a new home computer platform written? on retrocomputing describes how most toolchains for 8-bit micros were bootstrapped on other machines, often university minicomputers or mainframes that enterprising fellows like Gates and Allen had access to, or any other system that could output to a format you could program a ROM with, given some hardware hacking. Or just by hand in hex like Wozniak apparently did for Apple's 6502 machine code.

    Further bootstrapping-related retrocomputing Q&As:


    Food for thought: a compiler that compiles itself opens up an interesting loophole for a malicious version to hide some code in the compiler, propagating itself to future versions by recognizing when it's compiling itself and emitting that machine code. We don't think any current binary distributions of GCC / clang or other self-hosting compilers have this going on, but rebuilding from audited source isn't enough to verify. A wiki.c2.com article has more info, including that it was seen in the wild in Delphi 4 through 7, detected in 2009.

    The idea was first proposed by Ken Thompson in his 1984 article, Reflections On Trusting Trust, as a hypothetical possibility. (Including yacc and lex, and linkers, as part of the toolchain.) See also https://security.stackexchange.com/questions/222072/are-compilers-safe

    The actual security considerations aren't the point I'm trying to make with this section: Go read that article and take in the way they're describing how a compiler gets used to compile the next version of itself. That might help wrap your head around how this works.

    (But re: the security implications, there have been some Q&As on security.SE and others: one, two, three. Ultimately you have to trust something, unless you built your own CPU with your bare hands, starting with raw sand. Or at least, created the lithography machine and chip layout yourself; someone else could do the chemistry of preparing silicon and doping materials.)