We are writing a byte-code for a high-level compiled language, and after a bit of profiling and optimization, it became clear that the current largest performance overhead is the switch statement we're using to jump to the byte-code cases.
We investigated pulling out the address of each case label and storing it in the stream of byte-code itself, rather than the instruction ID that we usually switch on. If we do that, we can skip the jump table, and directly jump to the location of code of the currently executing instruction. This works fantastically in GCC, however, MSVC doesn't seem to support a feature like this.
We attempted to use inline assembly to grab the address of the labels (and to jump to them), and it works, however, using inline assembly causes the entire function to be avoided by the MSVC optimizer.
Is there a way to allow the optimizer to still run over the code? Unfortunately, we can't extract the inline assembly into another function other than the one that the labels were made in, since there's no way to reference a label for another function even in inline assembly. Any thoughts or ideas? Your input is much appreciated, thanks!
The only way of doing this in MSVC is by using inline assembly (which basically buggers you for x64):
int _tmain(int argc, _TCHAR* argv[])
{
case_1:
void* p;
__asm{ mov [p],offset case_1 }
printf("0x%p\n",p);
return 0;
}
If you plan on doing something like this, then the best way would be to write the whole interpreter in assembly then link that in to the main binary via the linker (this is what LuaJIT did, and it is the main reason the VM is so blindingly fast, when its not running JIT'ed code that is).
LuaJIT is open-source, so you might pick up some tips from it if you go that route. Alternatively you might want to look into the source of forth (whose creator developed the principle you're trying to use), if there is an MSVC build you can see how they accomplished it, else you're stuck with GCC (which isn't a bad thing, it works on all major platforms).