clangllvmbootstrapping

What are the benefits to using a bootstrapped compiler versus the same compiler built using the system's tooling?


I've been experimenting with recent versions of clang on an ancient Linux distro, using the system provided gcc7 to build a very recent version of the llvm project. This has been working great for me, and I've extensively using the generated clang binary to build many other projects.

I was recently told though that this is not the correct process -- and that once clang is built with gcc, you should be building clang with clang and using that as your compiler.

What is the benefit of bootstrapping a compiler this way, to someone who is just an end user? Is there anything in that will work differently with a bootstrapped vs. a non-bootstrapped version?


Solution

  • Generally, there should be no user-observable difference, aside from speed, between using Clang compiled by GCC and Clang compiled by Clang. So long as both compilers work correctly, and so long as the code being compiled (Clang's source code) does not rely on undefined behavior or implementation-defined behavior that is different between the two, then both will operate the same way and produce identical output. (In practice, both the compiler and compilee have plenty of bugs, but also in practice, this exercise is extremely unlikely to reveal any of them, provided both actually compile and link successfully.)

    There can be a speed difference, since two different code generators and optimizers are in use. If one of the compilers is significantly older, then it might not know about later CPU models and hence not take full advantage of all of the possible hardware features. One compiler might also simply have a better optimizer. Obviously, this depends on what switches were passed to the compiler; in particular, if the -O (optimize) switch was only passed to one, then there would likely be a substantial speed difference.

    Somewhat anecdotally, the slide deck Building Clang/LLVM efficiently by Tilmann Scheller presented at EuroLLVM 2015 reports on compile times (of the Clang sources) when using Clang built with either Clang or GCC and various optimization switches. He reports (on slide 11) that compiling Clang with GCC's profile-guided optimization outperformed other configurations by around 16%. Meanwhile, when Clang was compiled with ordinary -O3 with both, the speed of the resulting compilers were indistinguishable. But of course that was measured while compiling one program, nine years ago, so extrapolation is tenuous.

    There is nothing "incorrect" about continuing to use the non-bootstrapped compiler. Observe, for example, that Clang's build instructions do not mention recompiling it again with the output.

    In contrast, GCC's build instructions do say to do that (and in fact its make does so by default), but the justification is weak:

    bootstrapping is suggested because the compiler will be tested more completely and could also have better performance.

    That suggestion goes back to the early days of GCC development, when GCC itself was more suspect (and hence bootstrapping served as a potentially valuable test of the compiler) and the Unix vendor compilers were quite varied in their optimization quality.