oopcompiler-errorscompiler-constructioncompiler-optimizationtype-inference

How to handle forward references with type inference


Making a compiler for oop language, with the same language. The compiler currently traverse the ast 4 times, first two are used for resolving type chain, the 3rd is used to populate symbol table for future lookup and final is to resolve the symbols. During the resolution I am also trying to do type inference for resolving type for variable symbols.

Here is the sample code I am using to test the compiler.

var c = a.name;

var a = TestClass(1);

var d = a.test(1);

class TestClass<E> {
  TestClass(E name) : this.name = name;

  TestClass.create(this.name);

  E name;

  E test(E element) {
    return element;
  }

}

In the above code c is dependent on a for inferring its own type but a also need type inference. So, when the compiler tries to look up a when resolving the type for c it gets Unknown and throws a error.

this is my first time making a compiler, I am out of ideas on how to solve this problem.


Solution

  • Solving the problem

    You would need to generate a dependency map of the (unknown) types as the first type inference pass. As a data structure, the map can be either a set of pairs or an associative array of sets. The former is used here for illustrative purposes. For the provided example, we would have a dependency map as follows:

    c -> a.name 
    a.name -> a
    a -> TestClass(1)
    TestClass(1) -> TestClass
    d -> a.test(1)
    a.test(1) -> a.test
    a.test -> a
    

    Note that unknowns may depend on more than one other unknown. For example, if we had something along the lines of var a = b + c, then we would have both a -> b and a -> c as dependencies.

    I have kept the numeric literal 1 since I am not aware whether Dart allows templates to have constants along the lines of C++ template <unsigned i>. If it does not, we could use the type of a numeric literal instead (e.g. int, Number or Literal<int>) to make analysis faster.

    We do not add the implicit template parameters since we do not know which of the calls are made to template functions/classes. If the template parameter is explicitly provided for initialization of a, we would have the following dependencies for a instead.

    a -> TestClass<int>(1)
    TestClass<int>(1) -> TestClass<int>
    TestClass<int> -> TestClass
    

    We can then take all of the unknowns in the dependency map and sort the dependents after the corresponding dependencies. After sorting, we can resolve the types for each of the unknowns without running into issues. For the example, we would get the following list after sorting:

    TestClass
    TestClass(1)
    a
    a.test
    a.test(1)
    d
    a.name
    c
    

    Improving performance

    To reduce the computational cost of the sorting and to detect and report/resolve circular dependencies, we can compute the full sets of dependencies for each unknown before sorting. To do this, we repeatedly expand every dependency in the dependency map until the dependency map no longer changes; this procedure is sometimes called "closure" in compiler-related literature. As pseudocode, the procedure can be expressed as:

    let set: set of (T, T); # our set of dependencies
    let unknowns: set of T; # the set of all the unknowns
    let prevSize = 0;
    
    while prevSize < set.size: # continue while set is growing
        prevSize = set.size;
        for every (dependent, dependency) in set: # b -> c
            for every unknown in unknowns:
                if (unknown, dependent) in set: # if a -> b, then a -> c
                    add (unknown, dependency) to set
    

    The expansion is completed only after the AST has been fully traversed, although we could do partial updates at each step where a new dependency is added to the dependency map. In the final set, circular dependencies will appear in the form of unknown -> unknown. To illustrate the resulting set, here is a partial result for the provided example (which has no circular dependencies):

    c -> a.name
    c -> a
    c -> TestClass(1)
    c -> TestClass
    ...
    d -> a.test(1)
    d -> a.test
    d -> a
    d -> TestClass(1)
    d -> TestClass
    a.test(1) -> a.test
    a.test(1) -> a
    a.test(1) -> TestClass(1)
    a.test(1) -> TestClass
    ...