cmultithreadingdebuggingassemblydisassembly

What is the exact difference and the relation between thread entry and thread start?


  1. What is the exact difference between thread entry and thread start ? and
  2. does RIP (where the execution front is, in a dynamic analysis) always reaches them in the same predictable order ?
  3. is thread entry changing dynamically (in dynamic analysis I think I saw it being reported in registers and stack) ?

I understand so far that thread start is defined from a point of view, eg., in Windows, it's always ntdll.RtlUserThreadStart+21 (User) but at the program library level, it can be any function. But the thread start is not called before the thread is created ntdll.NtCreateThreadEx+14 (System).

The thread entry is the (library ie., exported, or private) function given as argument to the thread create function.

An example of a callstack with threads (threadID, Address, to, from, size, comment, party) made with x64dbg:

4200                                                                                                 
      00000076EBDFF9A8 00007FFEC900A34E 00007FFECB4EC034 A0  ntdll.NtWaitForSingleObject+14          System
      00000076EBDFFA48 00007FF7987B48A1 00007FFEC900A34E 30  kernelbase.WaitForSingleObjectEx+8E     User
      00000076EBDFFA78 00007FF7988961A0 00007FF7987B48A1 30  mylibrarydll0.00007FF7987B48A1          User
      00000076EBDFFAA8 00007FF7987B13DF 00007FF7988961A0 30  mylibrarydll0.00007FF7988961A0          User
      00000076EBDFFAD8 00007FF798B4A175 00007FF7987B13DF 30  mylibrarydll0.00007FF7987B13DF          User
      00000076EBDFFB08 00007FFECA637034 00007FF798B4A175 30  mylibrarydll0.sub_7FF798B4A0B4+C1       System
      00000076EBDFFB38 00007FFECB49D0D1 00007FFECA637034 80  kernel32.BaseThreadInitThunk+14         System
      00000076EBDFFBB8 0000000000000000 00007FFECB49D0D1     ntdll.RtlUserThreadStart+21             User
2736                                                                                                 
      00000076EB5FF648 00007FFECB4623D7 00007FFECB4EFA04 300 ntdll.NtWaitForWorkViaWorkerFactory+14  System
      00000076EB5FF948 00007FFECA637034 00007FFECB4623D7 30  ntdll.TppWorkerThread+2F7               System
      00000076EB5FF978 00007FFECB49D0D1 00007FFECA637034 80  kernel32.BaseThreadInitThunk+14         System
      00000076EB5FF9F8 0000000000000000 00007FFECB49D0D1     ntdll.RtlUserThreadStart+21             User
2468                                                                                                 
      00000076EBBFFB78 00007FFEC900A34E 00007FFECB4EC034 A0  ntdll.NtWaitForSingleObject+14          System
      00000076EBBFFC18 00007FF7987B48A1 00007FFEC900A34E 30  kernelbase.WaitForSingleObjectEx+8E     User
      00000076EBBFFC48 00007FF7988961A0 00007FF7987B48A1 30  mylibrarydll0.00007FF7987B48A1          User
      00000076EBBFFC78 00007FF7987B13DF 00007FF7988961A0 30  mylibrarydll0.00007FF7988961A0          User
      00000076EBBFFCA8 00007FF798B4A175 00007FF7987B13DF 30  mylibrarydll0.00007FF7987B13DF          User
      00000076EBBFFCD8 00007FFECA637034 00007FF798B4A175 30  mylibrarydll0.sub_7FF798B4A0B4+C1       System
      00000076EBBFFD08 00007FFECB49D0D1 00007FFECA637034 80  kernel32.BaseThreadInitThunk+14         System
      00000076EBBFFD88 0000000000000000 00007FFECB49D0D1     ntdll.RtlUserThreadStart+21             User
3784                                                                                                 
      00000076EB6FFB88 00007FFECB4623D7 00007FFECB4EFA04 300 ntdll.NtWaitForWorkViaWorkerFactory+14  System
      00000076EB6FFE88 00007FFECA637034 00007FFECB4623D7 30  ntdll.TppWorkerThread+2F7               System
      00000076EB6FFEB8 00007FFECB49D0D1 00007FFECA637034 80  kernel32.BaseThreadInitThunk+14         System
      00000076EB6FFF38 0000000000000000 00007FFECB49D0D1     ntdll.RtlUserThreadStart+21             User
1928                                                                                                 
      00000076EB7FFA48 00007FFEC900A34E 00007FFECB4EC034 A0  ntdll.NtWaitForSingleObject+14          System
      00000076EB7FFAE8 00007FF7987B48A1 00007FFEC900A34E 30  kernelbase.WaitForSingleObjectEx+8E     User
      00000076EB7FFB18 00007FF7988961A0 00007FF7987B48A1 30  mylibrarydll0.00007FF7987B48A1          User
      00000076EB7FFB48 00007FF7987B13DF 00007FF7988961A0 30  mylibrarydll0.00007FF7988961A0          User
      00000076EB7FFB78 00007FF798B4A175 00007FF7987B13DF 30  mylibrarydll0.00007FF7987B13DF          User
      00000076EB7FFBA8 00007FFECA637034 00007FF798B4A175 30  mylibrarydll0.sub_7FF798B4A0B4+C1       System
      00000076EB7FFBD8 00007FFECB49D0D1 00007FFECA637034 80  kernel32.BaseThreadInitThunk+14         System
      00000076EB7FFC58 0000000000000000 00007FFECB49D0D1     ntdll.RtlUserThreadStart+21             User
2276                                                                                                 
      00000076EB8FF7C8 00007FFEC900A34E 00007FFECB4EC034 A0  ntdll.NtWaitForSingleObject+14          System
      00000076EB8FF868 00007FF7987B48A1 00007FFEC900A34E 30  kernelbase.WaitForSingleObjectEx+8E     User
      00000076EB8FF898 00007FF7988961A0 00007FF7987B48A1 30  mylibrarydll0.00007FF7987B48A1          User
      00000076EB8FF8C8 00007FF7987B13DF 00007FF7988961A0 30  mylibrarydll0.00007FF7988961A0          User
      00000076EB8FF8F8 00007FF798B4A175 00007FF7987B13DF 30  mylibrarydll0.00007FF7987B13DF          User
      00000076EB8FF928 00007FFECA637034 00007FF798B4A175 30  mylibrarydll0.sub_7FF798B4A0B4+C1       System
      00000076EB8FF958 00007FFECB49D0D1 00007FFECA637034 80  kernel32.BaseThreadInitThunk+14         System
      00000076EB8FF9D8 0000000000000000 00007FFECB49D0D1     ntdll.RtlUserThreadStart+21             User
12168                                                                                                
      00000076EB9FF6E8 00007FFECB4623D7 00007FFECB4EFA04 300 ntdll.NtWaitForWorkViaWorkerFactory+14  System
      00000076EB9FF9E8 00007FFECA637034 00007FFECB4623D7 30  ntdll.TppWorkerThread+2F7               System
      00000076EB9FFA18 00007FFECB49D0D1 00007FFECA637034 80  kernel32.BaseThreadInitThunk+14         System
      00000076EB9FFA98 0000000000000000 00007FFECB49D0D1     ntdll.RtlUserThreadStart+21             User
2428                                                                                                 
      00000076EBAFF5D8 00007FFECB4623D7 00007FFECB4EFA04 300 ntdll.NtWaitForWorkViaWorkerFactory+14  System
      00000076EBAFF8D8 00007FFECA637034 00007FFECB4623D7 30  ntdll.TppWorkerThread+2F7               System
      00000076EBAFF908 00007FFECB49D0D1 00007FFECA637034 80  kernel32.BaseThreadInitThunk+14         System
      00000076EBAFF988 0000000000000000 00007FFECB49D0D1     ntdll.RtlUserThreadStart+21             User

Solution

  • Windows sends the debugger a specific set of events, you can find them in the documentation of WaitForDebugEvent.
    One of these events is CREATE_THREAD_DEBUG_INFO, which is sent when Windows has created but not yet started the thread.
    In Windows, process and thread creation happens in the kernel but their final initialization steps happen in userspace (unless it's a picoprocess, which we won't address here). The DLL ntdll.dll is mapped in the thread just after it's been created and the thread context's RIP is set to point to one of this DLL's functions. This function will perform the necessary initializations and then jump to the address given in CreateThread or similar. This function is kind of a wrapper for threads.

    It is quite granted that thread start happens when the first instruction of the initialization function is about to execute (think of it as if Windows had set a breakpoint there).
    The thread entry is, instead, just the address given to the thread creation API. It is important because it is the actual code the caller intended to be executed. In fact, for debugging or RE purposes, you can almost (if not always) ignore the thread start event.


    Let's do an example. Consider this simple 64-bit program.

    BITS 64
    
    EXTERN CreateThread 
    GLOBAL _start 
    
    SECTION .text 
    
    _start:
        and rsp, -16 
        
        push 0
        push 0
        sub rsp, 20h
        xor r9, r9 
        lea r8, [REL _thread1]
        xor edx, edx 
        xor ecx, ecx 
        call CreateThread
    
    .loop:
        TIMES 1000 pause 
    jmp .loop
    
    _thread1:
        TIMES 1000 pause
    jmp _thread1 
    

    All it does is create a thread pointing to a sled of pause instructions executed in a loop. The main thread will also execute a similar, but different, loop.
    The purpose of the loop is to have the RIP of the threads change but still not being inside a Windows API. Any instruction in the loop, granted it doesn't fault, will be fine. I picked pause, because :)

    Assemble and link the program.
    Open x64dbg, open the program, and then set the Thread start and Thread entry events. Debug event for x96dbg

    Now press F9 to reach the program entry point and press F9 again to let it go. The debugger will be notified of the new thread creation.

    New thread created

    Note that the execution stopped at the beginning of RtlUserThreadStart. This is always the case for my version of Windows (Windows 7 something). It makes sense, given the introduction at the beginning of this answer.
    Also note that the thread entry point is in rcx, meaning it is the first parameter for RtlUserThreadStart.

    Now, this was the event that Windows sent to the debugger, so it's natural the execution stopped here.
    But the thread entry event doesn't exist, what is x64dbg doing here?
    You can unveil this mystery by looking at the breakpoint tab.

    Breakpoints

    You see that the debugger set a one-time (i.e. it will be removed automatically by the debugger itself) breakpoint at the thread entry point.
    So, while Windows doesn't offer support for generating a debug event when a thread first starts executing its user code, a debugger can emulate it easily by putting a breakpoint there before the thread actually start.
    Note that this means the debugger always react to the thread start events, when disabled in the options it will simply not stop, show and wait for you to do something.


    Pausing and resuming the thread doesn't change the thread entry point, which is fixed at thread creation.
    x64dbg has a threads tab that allows the user to suspend and resume the threads. Playing with it doesn't change the thread entry point, just the RIPs that still point somewhere in the two loops (that exists for easing this test).


    If the thread is created with the suspend flag, the thread start event won't fire until the thread is resumed.
    But if, before resuming the thread, a pair of calls to Get/SetThreadContext is done to change the thread's RIP, then RtlUserStartThread will never be executed (IDK what this function does exactly, but a thread can do without it) and the thread start event will never fire.
    The execution will go straight to the altered RIP.
    I'm not sure if this is a legacy bug of Windows' debugging interface, the thread start event could be generated by setting the TF before the first schedule of the thread (and immediately removing it upon catching the relevant exception).
    When debugging/REing thread, what I usually do is putting a breakpoint in the thread entry point (which is easy to get) or in the hijacked RIP (which is also easy to get, since this kind of threads are created suspended, so you know something is fishy).
    If the program is being nasty and the code at the thread's RIP is not yet in clear (e.g. is still obfuscated), use a hardware breakpoint.

    Note This same whole thing happens for process creation too, exactly the same (only with the PE entry point instead of a thread entry point).