`LoadLibraryExW` triggers exception `0xC0000023` from `NtMapViewOfSection`

It's going to be really hard to reduce the scope of this question, but here we go.

Context

I'm in the context of a 32-bit ActiveX control which is loaded into a host (TstCon.exe). After unloading and reloading the control, I receive a streak of errors from NtMapViewOfSection, the first of which occurs when odbc32.dll uses LoadLibraryExW to load C:\Windows\system32\odbcint.dll. At that point, an SEH exception is emitted from somewhere inside NtMapViewOfSection with code 0xC0000023 (AKA STATUS_BUFFER_TOO_SMALL according to the debugger).

Aftermath

Here's what the callstack looks like when the debugger intercepts the exception:

ntdll.dll!_NtMapViewOfSection@40()
KernelBase.dll!BasepLoadLibraryAsDataFileInternal()
KernelBase.dll!BasepLoadLibraryAsDataFile()
KernelBase.dll!LoadLibraryExW()
odbc32.dll!_InitializeDll@0()
odbc32.dll!_SQLAllocEnv@4()
<OurDll>.dll!<OurFunction>()
...

At that point, I've used perfectly sane techniques to retrieve the arguments to the call to NtMapViewOfSection by following this documentation:

*(void**)(ESP + 4 + 0)           /*SectionHandle*/      0x000003b0              void *
*(void**)(ESP + 4 + 4)           /*ProcessHandle*/      0xffffffff              void *
*(void**)(ESP + 4 + 8)           /*BaseAddress*/        0x00daae30              void *
*(unsigned long*)(ESP + 4 + 12)  /*ZeroBits*/           0x00000000              unsigned long
*(unsigned long*)(ESP + 4 + 16)  /*CommitSize*/         0x00000000              unsigned long
*(long long**)(ESP + 4 + 20)     /*SectionOffset*/      0x00000000 {???}        __int64 *
*(unsigned long**)(ESP + 4 + 24) /*ViewSize*/           0x00daae28 {0x00000000} unsigned long *
*(int*)(ESP + 4 + 28)            /*InheritDisposition*/ 0x00000001              int
*(unsigned long*)(ESP + 4 + 32)  /*AllocationType*/     0x00800000              unsigned long
*(unsigned long*)(ESP + 4 + 36)  /*Protect*/            0x00000002              unsigned long

Assembly walkthrough

I have originally caught the exception by enabling break-on-throw in VS's debugger, then I've been able to pinpoint the first failing call and place a breakpoint just ahead. Here is what I can see from debugging inside the disassembly (> marks the current instruction):

  _NtMapViewOfSection@40:
  76F2EF60  mov         eax,28h  
  76F2EF65  mov         edx,offset _Wow64SystemServiceCall@0 (76F43430h)  
> 76F2EF6A  call        edx  
  76F2EF6C  ret         28h  
  76F2EF6F  nop

... step into:

  _Wow64SystemServiceCall@0:
> 76F43430  jmp         dword ptr [_Wow64Transition (76FD2218h)]

... step into:

> 74A37000  jmp         0033:74A37009  
  74A37007  add         byte ptr [eax],al  
  74A37009  inc         ecx  
  74A3700A  jmp         dword ptr [edi+0F8h]

... step into:

  _NtQueryObject@20:
  76F2EDC0  mov         eax,10h  
  76F2EDC5  mov         edx,offset _Wow64SystemServiceCall@0 (76F43430h)  
  76F2EDCA  call        edx  
> 76F2EDCC  ret         14h  
  76F2EDCF  nop

And the next step into triggers the exception.

Disturbances to the program's environment, such as:

Updating compilers and runtimes (between MSVC90 and MSVC141), which revealed the bug in the first place;
Switching between Release and Debug configurations;
Forcing a base address for the OCX through the /base linker flag;
Running with a debugger attached;
Monitoring system calls with drstrace.exe;

... change which calls to NtMapViewOfSection will succeed or fail, seemingly at random: not all of them fail, but a considerable number do. In fact, the first occurence of the error is probably not indicative of where the problem actually originates from, as I've rarely been able to make it crash earlier (upon unloading the control), and even obtained a crash where none of our code was on the callstack (through quitting TstCon.exe directly).

I can't find any documentation (official or otherwise) mentioning STATUS_BUFFER_TOO_SMALL or the 0xC0000023 code in this context. I've been unable to find a pattern in the failing calls, and saw no relevant access errors from a Dr. Memory run.

So... What could possibly be happening inside this process for such symptoms to appear?

Solution

After poking at the ActiveX control's execution for a long time, using the debugger to skip over specific sections of code, I narrowed the possible location of the bug down to a single function. It turns out that the origin of the problem was pretty impossible to guess.

Someone, someday, wanted to be able to give names to threads. To this end, they used the documented 0x406D1388 technique (in fact, the code is pretty much copy-pasted from the linked documentation). This, in itself, is fine. But then they wanted (or so I gather) to retrieve the name programmatically, which is not supported by the custom exception method. This was all before SetThreadDescription/GetThreadDescription existed, so they looked for another way. And man were they inventive.

Alongside the call to the function that threw the custom exception were the following lines:

// Grab the TIB.
P_T_TIB pTib = GetTIB();

if (pTib == NULL)
    return false;

// If someone has already written to the arbitrary field, I don't
// want to be overwriting it.
if (pTib->pvArbitrary == NULL)
{
    // Nothing's there. Set the name.
    pTib->pvArbitrary = (void *)pszName;
}

What's there, of course, is definitely not "nothing". GetTIB is defined as follows:

// A static function to get the TIB.
static P_T_TIB GetTIB()
{
    P_T_TIB pTib = NULL;

    _asm
    {
        MOV  EAX, FS:[18h]
        MOV  pTib, EAX
    }

    return pTib;
}

This piece of highly suspicious assembly, as I learned, retrieves a pointer to the Thread Information Block of the current thread. T_TIB's definition follows what's described on that page and allows the thread-naming function to store a pointer to the name into the Arbitrary Data Slot. This same technique was used in the name-getting function to retrieve the pointer and return the name.

Of course, the hitch is that "arbitrary" does not mean "user data", and NtMapViewOfSection and friends rely on the field to store information, as evidenced by a data breakpoint placed on the field that was hit by LdrpMapViewOfSection, among others. Somewhere along their path, encountering the field with an abnormally non-zero value (that is, our name pointer) caused them to perform some very wrong operation which ended up triggering that weird exception.

For the curious, I completely removed all of that, as it wasn't used anywhere else, and simply used a thread_local variable to store the name. Case closed!