compiler-constructionllvmllvm-irdynamic-dispatch

How to implement Object Oriented Dynamic Dispatch in LLVM?


I am trying to make a toy compiler for an object oriented language which supports dynamic dispatch and simple inheritance. Meaning any class Child that extends a parent can be used whenever a Parent class is declared. It inherits all its fields and its methods and it may override methods.

Up to now I have been considering implementing virtual function tables and making sure the memory layouts of Parent and Child classes are as similar as possible. Here is a small example in C to illustrate what I mean:

#include <stdlib.h>

typedef struct {
    struct ParentVTable *_vtable;
    int inheritedField;
} Parent;

struct ParentVTable {
    void (*inheritedMethod)(Parent *, int);
};

void Parent_inheritedMethod(Parent *self, int b) {
    b = 0;
}

struct ParentVTable ParentVTable_inst = {
    .inheritedMethod = &Parent_inheritedMethod
};

void Parent_init(Parent *self) {
    self->_vtable = &ParentVTable_inst;
    self->inheritedField = 42;
}

Parent *Parent_new(void) {
    Parent *self = (Parent*)malloc(sizeof(Parent));
    Parent_init(self);
    return self;
}

typedef struct {
    struct ChildVTable *_vtable;
    int inheritedField;
    int newField;
} Child;

struct ChildVTable {
    void (*inheritedMethod)(Child *, int);
    int (*newMethod)(Child *, int);
};

int Child_newMethod(Child *self, int i) {
    return i + self->inheritedField;
}

struct ChildVTable ChildVTable_inst = {
    .inheritedMethod = (void (*)(Child *, int)) Parent_inheritedMethod,
    .newMethod = &Child_newMethod
};

void Child_init(Child *self) {
    Parent_init((Parent *) self);
    self->_vtable = &ChildVTable_inst;
    self->newField = 0;
}

Child *Child_new(void) {
    Child *self = (Child*)malloc(sizeof(Child));
    Child_init(self);
    return self;
}

int main() {
    Parent *p = (Parent*) Child_new();
    return 0;
}

As you can see in the main method, a Child class can be used as a Parent class provided that it is cast as a Parent. Using clang this cast translates in LLVM into a bitcast operation without which the llvm is not valid (the llvm code does not compile). Here is the corresponding piece of code:

define i32 @main() #0 {
  %1 = alloca i32, align 4
  %2 = alloca %struct.Parent*, align 8
  store i32 0, i32* %1, align 4
  %3 = call %struct.Child* @Child_new()
  %4 = bitcast %struct.Child* %3 to %struct.Parent*
  store %struct.Parent* %4, %struct.Parent** %2, align 8
  ret i32 0
}

Up to there everything is fine. The thing is that in the language I am trying to make, as well as in most object oriented languages that implement inheritance, this cast that is done in C is implicit. I have no way to see, at compile time, that a cast is required. My question, thus, is how do I know when I have to bitcast an llvm variable for example before assigning it or before passing it as an argument to call etc. Assuming this is only known at runtime. Am I missing something, should I simply bitcast every time to make sure the correct 'objects' are passed around in the code? Any help would be appreciated!


Solution

  • What you're seeing is a side effect of the Liskov substutution principle. Object-oriented languages have those implicit casts according to Liskov's principle, LLVM IR is an assembly language and doesn't, and you're writing a tool that translates code from one kind of language to the other so you have to add explicit casts.

    Write functions like isAssignableFrom() and cast() and call them when necessary, and you'll find that it works quite naturally.

    The same thing happens for primitive types, BTW. a=b requires you to check that a's type is assignable from b's and to add a cast, and the kind of cast depends on your language's type system. (In some languages, casting 42 to boolean produces true, in others false, and in some languages it leads to an error.) Your implementation of cast() may be quite complex by the time you've implemented all of your type rules.