llvmllvm-ir

Dynamic Arrays in LLVM - Declaring a constant/global


I want to model dynamic arrays. This is the plan i've come up with: there will be a base struct for all my arrays, including a vtable-pointer, and also the runtime-size of the array:

%anyarray_base = type {
  %other_stuff,
  i64,           ; runtime size
}
%bytearray = type { %anyarray_base, [0 x i8] }

This works for arrays created fully at runtime. I malloc memory for the %anyarray_base plus the size of the "payload". I can access the data in that [0 x i8] using getelementptr just fine.

The problem i have is constants. Very concrete case: my program has a constant string "Hello World", and i want to create a constant in my LLVM module to hold that string. So, i'd write

@myConstantString = global %bytearray {
  %anyarray_base {
    @other_stuff,    ; constant misc data about the array
    i64 12           ; array size, 12 bytes
  },
  [12 x i8] c"Hello, World!"  ; the actual literal from the source code
}

llvm-as doesn't accept this:

error: element 1 of struct initializer doesn't match struct element type
(points at the global %bytearray constant)

I'm clearly missing some understanding on how LLVM works. Please help me build Hello World in my toy language :)


Solution

  • What you need is types, lots of types! Which will add complexity, which you need to contain within the smallest possible part of your code.

    You need one type for each array size that you will use as a constant. If your code uses string constants of length 0, 1, 2, 5, 10 and 15, you need six string types. These will typically be in a map from int to type, maintained by a small module with just two public functions:

    1. One function provides and returns a pointer to a constant array (such as a string). That function's purpose is to encapsulate five of the six types. If the string is "hello", then it asks the LLVM Module to allocate an instance of your "five-byte string" type, and then it returns a pointer to that.

    2. The other function returns the string type. It always returns the "zero-byte string".

    Most of the code you write uses only one type for strings, and will getelementptr beyond the end of the type (which is well-defined in LLVM IR). One small module sees many string types.

    This is easily generalised to any array of constant-sized elements, and leads to pleasant, simple code when you use it with LLVM.