cassemblyx86compiler-optimizationtcc

Tiny C Compiler's generated code emits extra (unnecessary?) NOPs and JMPs


Can someone explain why this code:

#include <stdio.h>

int main()
{
  return 0;
}

when compiled with tcc using tcc code.c produces this asm:

00401000  |.  55               PUSH EBP
00401001  |.  89E5             MOV EBP,ESP
00401003  |.  81EC 00000000    SUB ESP,0
00401009  |.  90               NOP
0040100A  |.  B8 00000000      MOV EAX,0
0040100F  |.  E9 00000000      JMP fmt_vuln1.00401014
00401014  |.  C9               LEAVE
00401015  |.  C3               RETN

I guess that

00401009  |.  90   NOP

is maybe there for some memory alignment, but what about

0040100F  |.  E9 00000000     JMP fmt_vuln1.00401014
00401014  |.  C9              LEAVE

I mean why would compiler insert this near jump that jumps to the next instruction, LEAVE would execute anyway?

I'm on 64-bit Windows generating 32-bit executable using TCC 0.9.26.


Solution

  • Superfluous JMP before the Function Epilogue

    The JMP at the bottom that goes to the next statement, this was fixed in a commit. Version 0.9.27 of TCC resolves this issue:

    When 'return' is the last statement of the top-level block (very common and often recommended case) jump is not needed.

    As for the reason it existed in the first place? The idea is that each function has a possible common exit point. If there is a block of code with a return in it at the bottom, the JMP goes to a common exit point where stack cleanup is done and the ret is executed. Originally the code generator also emitted the JMP instruction erroneously at the end of the function too if it appeared just before the final } (closing brace). The fix checks to see if there is a return statement followed by a closing brace at the top level of the function. If there is, the JMP is omitted

    An example of code that has a return at a lower scope before a closing brace:

    int main(int argc, char *argv[])
    {
      if (argc == 3) {
          argc++;
          return argc;
      }
      argc += 3;
      return argc;
    }
    

    The generated code looks like:

      401000:       55                      push   ebp
      401001:       89 e5                   mov    ebp,esp
      401003:       81 ec 00 00 00 00       sub    esp,0x0
      401009:       90                      nop
      40100a:       8b 45 08                mov    eax,DWORD PTR [ebp+0x8]
      40100d:       83 f8 03                cmp    eax,0x3
      401010:       0f 85 11 00 00 00       jne    0x401027
      401016:       8b 45 08                mov    eax,DWORD PTR [ebp+0x8]
      401019:       89 c1                   mov    ecx,eax
      40101b:       40                      inc    eax
      40101c:       89 45 08                mov    DWORD PTR [ebp+0x8],eax
      40101f:       8b 45 08                mov    eax,DWORD PTR [ebp+0x8]
    
      ; Jump to common function exit point. This is the `return argc` inside the if statement
      401022:       e9 11 00 00 00          jmp    0x401038
    
      401027:       8b 45 08                mov    eax,DWORD PTR [ebp+0x8]
      40102a:       83 c0 03                add    eax,0x3
      40102d:       89 45 08                mov    DWORD PTR [ebp+0x8],eax
      401030:       8b 45 08                mov    eax,DWORD PTR [ebp+0x8]
    
      ; Jump to common function exit point. This is the `return argc` at end of the function 
      401033:       e9 00 00 00 00          jmp    0x401038
    
      ; Common function exit point
      401038:       c9                      leave
      401039:       c3                      ret
    

    In versions prior to 0.9.27 the return argc inside the if statement would jump to a common exit point (function epilogue). As well the return argc at the bottom of the function also jumps to the same common exit point of the function. The problem is that the common exit point for the function happens to be right after the top level return argcso the side effect is an extra JMP that happens to be to the next instruction.


    NOP after Function Prologue

    The NOP isn't for alignment. Because of the way Windows implements guard pages for the stack (Programs that are in Portable Executable format) TCC has two types of prologues. If the local stack space required < 4096 (smaller than a single page) then you see this kind of code generated:

    401000:       55                      push   ebp
    401001:       89 e5                   mov    ebp,esp
    401003:       81 ec 00 00 00 00       sub    esp,0x0
    

    The sub esp,0 isn't optimized out. It is the amount of stack space needed for local variables (in this case 0). If you add some local variables you will see the 0x0 in the SUB instruction changes to coincide with the amount of stack space needed for local variables. This prologue requires 9 bytes. There is another prologue to handle the case where the stack space needed is >= 4096 bytes. If you add an array of 4096 bytes with something like:

    char somearray[4096] 
    

    and look at the resulting instruction you will see the function prologue change to a 10 byte prologue:

    401000:       b8 00 10 00 00          mov    eax,0x1000
    401005:       e8 d6 00 00 00          call   0x4010e0
    

    TCC's code generator assumes that the function prologue is always 10 bytes when targeting WinPE. This is primarily because TCC is a single pass compiler. The compiler doesn't know how much stack space a function will use until after the function is processed. To get around not knowing this ahead of time, TCC pre-allocates 10 bytes for the prologue to fit the largest method. Anything shorter is padded to 10 bytes.

    In the case where stack space needed < 4096 bytes the instructions used total 9 bytes. The NOP is used to pad the prologue to 10 bytes. For the case where >= 4096 bytes are needed, the number of bytes is passed in EAX and the function __chkstk is called to allocate the required stack space instead.