compiler-constructioncpu-registersvm-implementationregister-allocation

How do compilers store hundreds of variables in only a few registers?


Say you have a virtual machine that only has 4 registers, A, B, C, and D. How do compilers store so many variables with only a limited amount of space?

Are there multiple ways of doing this, or is there a single solid way that this is accomplished? What's the fancy scientific-term for this, and also is it considered a complicated problem?


Solution

  • I recommend you read Programming Language Pragmatics or the Dragon Books, in particular the chapters on register allocation.

    In short, there are many approaches to handling this situation. Commonly the compiler builds an intermediate representation which can be an abstract machine with an infinite number of registers or an SSA form. When code is generated for a particular target hardware/OS, these abstract registers are assigned to real registers or to stack locations, based on criteria like use frequency or life span of the abstract registers (i.e. your original variables).

    There are different approaches depending on the chosen intermediate representation (see for example here or here). The problem can be difficult if you are striving for an optimal solution (i.e. keep as many variables for as long as possible in actual registers without spilling them onto the stack), but there are simpler approaches like "linear scan register allocation" when time is critical e.g. in just-in-time compilations.

    If you'd like to dig into code, perhaps take a look at the LLVM infrastructure and their register allocation and this presentation.