assemblycompiler-constructionasmjit

How to Deal with Converting Lengthy Assembly Code to AsmJIT function calls?


I am developing a compiler whose project is as follows:

↓   . Take a line of Python code;
    . Convert Python code to an IR (Intermediate Language);
    . Convert the intermediate language to assembly;
    . Optimization (After)
    . Use AsmJIT to convert assembly code to machine code;
    . Execution at runtime in memory.

    . Next line                                                                                                         
                                                              ↑

And decided to use the AsmJIT to assemble the code. However, now I have a lengthy assembly code, and the issue lies in "converting" this assembly code to AsmJIT function calls, which look like:

  a.mov(eax, dword_ptr(esp, 4));    // Load the destination pointer.
  a.mov(ecx, dword_ptr(esp, 8));    // Load the first source pointer.
  a.mov(edx, dword_ptr(esp, 12));   // Load the second source pointer.
 
  a.movups(xmm0, ptr(ecx));         // Load 4 floats from [ecx] to XMM0.
  a.movups(xmm1, ptr(edx));         // Load 4 floats from [edx] to XMM1.
  a.addps(xmm0, xmm1);              // Add 4 floats in XMM1 to XMM0.
  a.movups(ptr(eax), xmm0);         // Store the result to [eax].
  a.ret();                          // Return from function.

Given the considerable size of my code, it is extremely difficult to perform this "conversion" manually. Would it be possible to use strings with assembly code, how should I proceed?


Solution

  • If I understand the question correctly you have a pipeline and you would like to use AsmJit to encode code that is the output of the pipeline (the output is machine code that you want to assemble).

    This is possible with AsmJit and here is an example:

    #include <asmjit/x86.h>
    #include <stdio.h>
    
    using namespace asmjit;
    
    class MyErrorHandler : public ErrorHandler {
    public:
      void handleError(Error err,
                       const char* message,
                       BaseEmitter* origin) override {
        printf("AsmJit error: %s\n", message);
      }
    };
    
    static x86::Gp createGp32(uint32_t id) noexcept { return x86::gpd(id); }
    static x86::Gp createGp64(uint32_t id) noexcept { return x86::gpq(id); }
    static x86::Vec createXmm(uint32_t id) noexcept { return x86::xmm(id); }
    
    int main() {
      MyErrorHandler eh;
      FileLogger logger(stdout);
    
      JitRuntime rt;
      CodeHolder code;
    
      code.init(rt.environment());
      code.setErrorHandler(&eh);
      code.setLogger(&logger);
    
      x86::Assembler a(&code);
    
      // An example of converting instruction name to AsmJit's instruction id.
      const char* instructionName = "vcvtsi2sd";
      InstId instructionId =
        InstAPI::stringToInstId(a.arch(), instructionName, strlen(instructionName));
    
      // An example of creating operands dynamically, encodes 'vcvtsi2sd xmm0, xmm1, rbx'.
      a.emit(instructionId, createXmm(0), createXmm(1), createGp64(3));
    
      return 0;
    }
    

    Basically, if you use emit(), you can create the whole instruction to assemble dynamically, based on the output from your own pipeline. The approach could be completely table driven (converting your IR opcode to a single instruction) or the translation could be more complex (converting a higher level IR to 1 or more instructions).

    Check out InstId (instruction identifier) and Operand (All operands inherit from Operand so you can use Operand in your structs and then it could be anything - Label, Register, Immediate value, or Memory address).

    If you have a textual representation as an output, you can use AsmTK:

    Which is basically a parser that uses AsmJit's instruction API to convert a textual input into a form that AsmJit understands.