I'm building a toy compiler from a C like language to a stack machine and I'm at the point where I need to figure out what to do with functions and block local variables. Thinking through it abstractly it looks like I have two options at opposite ends of the spectrum: 1) Preprocess and preallocate stack space for each variable, 2) Add special instructions to the VM to walk the stack.
This has the advantage of giving me all the addresses for the variables ahead of time so I don't have to be very clever or add any extra instructions to the VM for walking the stack. The downside is that it is potentially very wasteful because conditional code that never executes but declares a whole bunch of variables will take up a lot of unnecessary space. For example,
a : t1 = value;
if (test) {
b : t2; c : t3; d : t4; ...;
}
In the above code even if test
is always false I will still allocate space for all those variables within the conditional branch.
The other approach I could come up with was to generate code for each variable declaration and then add some special VM instructions to figure out the address of those variables at runtime. This solves the problem of wasted stack space but then adds a computational overhead that I can potentially solve by some caching methods.
So what's the correct approach and is there another approach I didn't think of that is better?
The idea of a stack machine is that it does computations on an operand stack. It doesn't mean everything has to be stored on a stack. This is a common misconception. Typically your local vaiables / block scoped access maps to a register operation.
.NET CLR and Java both have instructions to store and fetch "local" variables as well as other types of variables. I would suggest you follow suit, because you don't want to walk the stack for simple variable access. That is very inefficient. It should be efficient to load / store variables, like a register. Most stack machines still have random access storage.
In the CLR, we also pre-allocate all of our local variables at the beginning of each method. The variables you preallocate may be a mix of explicit high level variables and compiler generated temporaries. But nothing says they have to be on a stack. On the VMs I've worked on we implemented them in a fast access area like an associative array or a vector like structure. I recommend that you use ildasm to disassemble a .NET method and take note of how the local variables are declared and handled.
Example:
total = apples + oranges
Maps to:
ldloc 'apples' # load a local onto stack
ldloc 'oranges' # load a local onto stack
add # add 2 operands on stack
stloc 'total' # store local from stack
In a previous answer I gave an explanation of stack based machines and compared to register machines. I hope there is some useful information in it. https://stackoverflow.com/a/24301283/257090
It is fairly straightforward to implement a prototype of stfld (store field) and ldfld (load field) with a simple Dictionary or HashTable. Later, you can optimize the assemble or runtime to compile symbolic named based references to integer references, especially if there is no need for runtime lookup of variables by name. However, for reflection APIs, you will need to implement additional metadata to cross-reference the numerical or pointer addresses back to their original names.
PS: If I've misunderstood your question, or you want to discuss more, please comment.