gcccompilationcompiler-constructioncompiled-language

does new compilers turn the source code into a binary or do they just transpile and hand it over to another compiler


does new compilers (e.g. the rust compiler or zig compiler) turn the source code into another language (e.g. C) and let that languages compiler (e.g. gcc) spit out the exe or do they do everything all by themselves (ıd assume that they will use another compiler becuase making one seems to be very hard) also by compiler I mean the ones that spit out executables and not a languages bytecode (e.g. not Java becuase it spits out java bytecode either C# becuase it spits out IL)


Solution

  • Compilation to a Target Language

    Compilers generally have several strategies for compiling source code, however they always boil down to compiling to a target language.

    Take as an example the C language, it will compile source code to assembly which then gets assembled into object files or executables. In this case the assembly language is the target.

    In Java a similar situation happens, Java source code is translated into a set of bytecode instruction which gets interpreted by the Java Virtual Machine (JVM). In this case the bytecode language is the target.

    Complexity of lowering source code

    As you mentioned it is complex to lower source code to assembly. Just lowering to assembly is already ambiguous, which assembly language are we referring to? Furthermore, different CPUs have different instructions available simply look at the different SIMD instruction sets available. Similarly, not every CPU is specialized for the same task so it might be beneficial to use different instructions depending on your CPU.

    Simply lowering source code is incredibly tricky if one wants to do everything on their own if their goal is cross-platform usability of their compiler output.

    Due to this reason there exists several toolchains that aim at simplifying the compilation process. One prominent example is LLVM which allows languages to compile to a simplified assembly-like language named LLVM-IR. Once translated LLVM can take over the compilation process and ensure the cross-compatibility. Now the responsibility of translating to an executable is passed to LLVM.

    One can say that LLVM-IR is the target language for the source code. Similarly, LLVM can be seen as a compiler taking as source code LLVM-IR with as output an executable.

    So do most compilers do everything on their own?

    If we see LLVM as a separate compiler, then the answer is often no. C, C++, Rust, and several more languages have compilers that utilize LLVM for the back-end generation. Thus they do depend on other tools for implementing the back-end.

    Languages compiling to a "high" level language

    There are also languages that compile to a high-level language. The advantage of this approach is that a language can utilize the existing language features to simplify translation and utilize the compiler optimizations. This can simplify the implementation of new languages.

    An example of this approach is where the Typescript compiler can compile to Javascript. There are multiple compilers, however at the moment I cannot think of any examples. In my personal language projects I have done both the LLVM approach and this approach several times.

    Note

    I tried to simplify the whole process as much as possible. So there could be much more nuance in practice. This answer is mainly used to provide a high-level overview on this problem.