Are multiple compilation units still worthwhile when (execution time) >>> (compile time)?

Based on my understanding, the chief benefits of creating a program with multiple compilation units are reusability of components and shorter compile times when incorporating small changes.

I also think (possibly wrongly) that there is a penalty associated with this, in that functions which are defined in their own compilation units cannot be declared as "inline".
[I recognize that this keyword does not actually force the compiler to inline-expand functions, but my understanding is that it gives the compiler more flexibility to optimize, and is therefore worth including wherever possible.]

So far so good?

My real question is whether the cost/benefit analysis still favours multiple compilation units when the program is solving a complicated modelling problem, and is required to iterate through its main loop for months on a cluster in order to generate useful output.

Say a multi-compilation unit program takes a few minutes to compile while the same program re-configured as a single compilation unit takes a few hours to compile... if the single compilation unit declares all functions as inline and thus presents more optimization opportunities, it seems reasonable to me to expect that execution time could decrease by a few percent, more than making up for the extra compile time.

Are there good rules of thumb for situations like this, or is it heavily situation-dependent?

Solution

As already stated by others, the main benefit of decomposing a program into different compilation units is readability. Shorter compilation times is somehow a nice side-effect of the idea.

If you care about inlining, you can resort to Link Time Code Generation and Link-Time Optimization. The combination of program decomposition into compilation units and LTO looks like the ideal solution, although it is not clear if the kind of optimizations performed by a compiler when the full definition of the function is available could be performed by LTO. For example, I don't know if LTO could support return-value optimization in C++, since it is something done at an high level of abstraction. Performance tests are needed. (Edit: RVO is performed even in absence of LTO and advanced tricks like these, at least on gcc and clang [I tried]. Most probably, this optimization is performed by changing the ABI of the function, which takes a "hidden pointer" to the object which has to be constructed and returned in the function.)

A third solution which could be worth investigating is to use something like sqlite amalgamation, a process of putting different compilation units into one giant .c file. Looks like something which requires a kinda heavy user-made infrastructure, although.