During the development stages of my compiler I ran into a pretty complex problem: how to store weakly typed variables in my language.
Since I allow variables to be declared without explicitly specifying their type, and allow functions to return either type (e.g. function can return a scalar OR array), I am now facing the difficulty of what form to store these variables in.
Here are the possibilities I've concidered, but all of them have a significant overhead:
List<double>
) and have the first element specify whether it's a scalar or array (0
or 1
for instance).object
instances.TVar
(custom class), which can be either a double
or List<double>
.To keep in mind:
ILAsm
which is a higher-level flavour of assembly (.NET intermediate language basically)This obviously depends a lot on your language. If you don't fix variable types at compile time, then you need to wrap all values with type information. (This is sometimes referred to as "boxing" the variable, although it's not the only thing that "boxing" can mean.)
On the other hand, you might be able to deduce the variable type at compile time. For example, awk
(which, despite its complete lack of declaration syntax, is sometimes implemented with a compiler to some kind of virtual machine) allows both scalar and array variables, but it is quite possible to figure out the type of each awk variable:
Aside from being passed as function arguments, an array variable cannot be used without a subscript, because awk
does not allow array assignment. So any variable used with subscripts must be an array, and any variable used without subscripts, except in the call to a function, must be a scalar.
Functions don't have prototypes either, but all useful parameters must be either used in the function body or passed to another function. So it is possible to create a prototype for every function, identifying each variable as scalar/array/unknown.
A least fixed-point repetitive scan over function calls will then provide precise information about every useful variable. If a variable is used both as a scalar and as an array, then an error can be thrown. If a variable is not used at all (except for possibly being passed to functions which don't use the corresponding parameter), then the variable could be simply eliminated, or it could be compiled as an (unused) scalar.
That's not enough to fully type awk
variables, as there are three scalar types, so boxing is still needed in most cases. In some cases, it is probably possible to deduce scalar types as well, although it will be trickier because of automatic coercions. However, your language only has a single scalar type, so a strategy similar to the above might be workable.