c++bytecodecontrol-flowdecompiler

Reconstructing Control Flow of Decompiled Program


I am writing a very simple decompiler (in C++) for a basic compiled bytecode (an entirely different language). The executor uses a stack-based machine, and most instructions are fairly easy to piece back together.

I have run into an interesting predicament with conditions and looping constructs. One of the opcodes in the bytecode sets the location of the executor (sort of like a jmp instruction) if the previous operand evaluates to false. Thus, if the condition is met, the executor will continue execution at the current instruction pointer.

For now, I've implemented these as simple gotos, but I'd like to expand this functionality to piece together the original if / else constructs. Here's an example of the original source:

function myFunction() {
  if (this.var1 == "foo") {
    this.var2 = "bar";
  } else {
    this.var2 = "baz";
  }
}

And my decompiled output:

goto label23;

function myFunction() {
    if (!(this.var1 == "foo")) {
        goto label16;
    }
    this.var2 = "bar";
    goto label21;
    label16:
        this.var2 = "baz";
    label21:
        return 0;
}
label23:

Is there a method that I can apply to this "decompiled" source to piece the conditions back together in a way such that it resembles the original source? I know no decompiler is perfect, but I'm curious how decompilers like Ghidra tackle this sort of problem. Because my bytecode is not necessarily machine code (and more like compressed source code), I assume my use-case is much simpler than Ghidra's decompiler.


Solution

  • Reko is a decompiler that tries to reconstruct C-like code from machine code. It has a pass that reconstructs high-level constructs like if, while and switch statements. The code is based on the paper "Native x86 Decompilation using Semantics-Preserving Structural Analysis and Iterative Control-Flow Structuring" by Edward J. Schwartz, JongHyup Lee, Maverick Woo and David Brumley.

    Although it is written in C#, it shouldn't be terribly hard to port the one class to C++.