optimization compiler-construction compiler-optimization intermediate-code

What is the purpose of code optimization at intermediate phase in compiler?

Some code optimizations are carried out on the intermediate code because

They enhance the portability of the compiler to the target processor
Program analysis is more accurate on intermediate code than on machine code
The information from dataflow analysis cannot otherwise be used for optimization
The information from the front end cannot otherwise be used for optimization

IMO : Intermediate codes are machine independent codes. So, intermediate code can be used for code optimization since a given source code can be converted to target machine code. hence , option (1) , but somewhere explained option (2) is also true .

What is the purpose of code optimization and it's benefits at intermediate phase in compiler ?

Solution

Compiler optimization portability is not the reason why many optimizations are performed on the intermediate code. However, it's an advantage that we get for free as a consequence of that. The other three points you stated are vague. Anyway, we don't have to discuss them.

To answer your question, I've to go through the operation of a typical Ahead-Of-Time (AOT) compiler (the question only applies to this type of compilers). During compilation, the compiler typically deals with five representations of the source code:

The textual representation (this is the code you've written).
The concrete syntax tree produced by the parser.
The abstract syntax tree produced by the syntax analyzer.
The intermediate representation (IR) produced by the IR code generator. This would be the last operation performed by the frontend.
The binary representation (as specified by the target ISA) produced by the binary code generator.

Now let's see which representation is best to perform optimizations. Using any of the first three representations is going to result in an extremely slow compiler because almost any optimization needs to extensively analyze and modify the input representation. I said almost because there are few optimizations performed by the frontend (typically on the AST). A common example is constant folding. The reason that such optimizations are performed at this level is that all modifications they make are local (within an expression). Therefore, they're cheap. Also they make the generated IR code a little cleaner and more amenable for further analysis. On the other hand, ASTs are perfect to perform semantic analysis so that the compiler can discover any errors as soon as possible and abort further processing if any were found.

The majority of compiler optimizations accept IR code as input and produce (hopefully optimized) IR code as output (well, some compilers may gradually lower the IR till one optimization emits binary code). An intermediate language is designed specifically to apply optimizations. First, it has a sequential representation (similar to binary code) which can be easily modified. Second, the IR preserves most of the information available in the AST. This includes global, local and temporary variable definitions and types. This expressiveness enables the compiler to optimize the code much more effectively. Third, it's low-level such that its instructions are primitive and only one or few consecutive IL instructions are mapped to few target ISA instructions. This helps the code generator to fulfill its purpose quickly.

There are few optimizations performed on the binary code. These include first-pass or second-pass instruction scheduling and second-pass register allocation.

After all of this, the linker (if required) begins its work which may include few other optimizations.

Please note that most compiler optimizations can also be performed on the binary code (although not as effective). This type of optimizations is called dynamic binary optimizations and they're used in dynamic binary translation and instrumentation.

I would like to say few words about portability. The IL enables us to use the same backend for multiple source languages. However, even if we know for sure that only one language will ever be supported, the IL is still very important as I've just explained. Also few extremely important optimizations are dependent on the target ISA. There are many optimizations that transform code from IR to IR. These are obviously target-independent. These optimizations are indeed portable and can be shared between backends for different target architectures.