It's going to be really hard to reduce the scope of this question, but here we go.
I'm in the context of a 32-bit ActiveX control which is loaded into a host (TstCon.exe
). After unloading and reloading the control, I receive a streak of errors from NtMapViewOfSection
, the first of which occurs when odbc32.dll
uses LoadLibraryExW
to load C:\Windows\system32\odbcint.dll
. At that point, an SEH exception is emitted from somewhere inside NtMapViewOfSection
with code 0xC0000023
(AKA STATUS_BUFFER_TOO_SMALL
according to the debugger).
Here's what the callstack looks like when the debugger intercepts the exception:
ntdll.dll!_NtMapViewOfSection@40()
KernelBase.dll!BasepLoadLibraryAsDataFileInternal()
KernelBase.dll!BasepLoadLibraryAsDataFile()
KernelBase.dll!LoadLibraryExW()
odbc32.dll!_InitializeDll@0()
odbc32.dll!_SQLAllocEnv@4()
<OurDll>.dll!<OurFunction>()
...
At that point, I've used perfectly sane techniques to retrieve the arguments to the call to NtMapViewOfSection
by following this documentation:
*(void**)(ESP + 4 + 0) /*SectionHandle*/ 0x000003b0 void *
*(void**)(ESP + 4 + 4) /*ProcessHandle*/ 0xffffffff void *
*(void**)(ESP + 4 + 8) /*BaseAddress*/ 0x00daae30 void *
*(unsigned long*)(ESP + 4 + 12) /*ZeroBits*/ 0x00000000 unsigned long
*(unsigned long*)(ESP + 4 + 16) /*CommitSize*/ 0x00000000 unsigned long
*(long long**)(ESP + 4 + 20) /*SectionOffset*/ 0x00000000 {???} __int64 *
*(unsigned long**)(ESP + 4 + 24) /*ViewSize*/ 0x00daae28 {0x00000000} unsigned long *
*(int*)(ESP + 4 + 28) /*InheritDisposition*/ 0x00000001 int
*(unsigned long*)(ESP + 4 + 32) /*AllocationType*/ 0x00800000 unsigned long
*(unsigned long*)(ESP + 4 + 36) /*Protect*/ 0x00000002 unsigned long
I have originally caught the exception by enabling break-on-throw in VS's debugger, then I've been able to pinpoint the first failing call and place a breakpoint just ahead. Here is what I can see from debugging inside the disassembly (>
marks the current instruction):
_NtMapViewOfSection@40:
76F2EF60 mov eax,28h
76F2EF65 mov edx,offset _Wow64SystemServiceCall@0 (76F43430h)
> 76F2EF6A call edx
76F2EF6C ret 28h
76F2EF6F nop
... step into:
_Wow64SystemServiceCall@0:
> 76F43430 jmp dword ptr [_Wow64Transition (76FD2218h)]
... step into:
> 74A37000 jmp 0033:74A37009
74A37007 add byte ptr [eax],al
74A37009 inc ecx
74A3700A jmp dword ptr [edi+0F8h]
... step into:
_NtQueryObject@20:
76F2EDC0 mov eax,10h
76F2EDC5 mov edx,offset _Wow64SystemServiceCall@0 (76F43430h)
76F2EDCA call edx
> 76F2EDCC ret 14h
76F2EDCF nop
And the next step into triggers the exception.
Disturbances to the program's environment, such as:
/base
linker flag;drstrace.exe
;... change which calls to NtMapViewOfSection
will succeed or fail, seemingly at random: not all of them fail, but a considerable number do. In fact, the first occurence of the error is probably not indicative of where the problem actually originates from, as I've rarely been able to make it crash earlier (upon unloading the control), and even obtained a crash where none of our code was on the callstack (through quitting TstCon.exe
directly).
I can't find any documentation (official or otherwise) mentioning STATUS_BUFFER_TOO_SMALL
or the 0xC0000023
code in this context. I've been unable to find a pattern in the failing calls, and saw no relevant access errors from a Dr. Memory run.
So... What could possibly be happening inside this process for such symptoms to appear?
After poking at the ActiveX control's execution for a long time, using the debugger to skip over specific sections of code, I narrowed the possible location of the bug down to a single function. It turns out that the origin of the problem was pretty impossible to guess.
Someone, someday, wanted to be able to give names to threads. To this end, they used the documented 0x406D1388
technique (in fact, the code is pretty much copy-pasted from the linked documentation). This, in itself, is fine. But then they wanted (or so I gather) to retrieve the name programmatically, which is not supported by the custom exception method. This was all before SetThreadDescription
/GetThreadDescription
existed, so they looked for another way. And man were they inventive.
Alongside the call to the function that threw the custom exception were the following lines:
// Grab the TIB.
P_T_TIB pTib = GetTIB();
if (pTib == NULL)
return false;
// If someone has already written to the arbitrary field, I don't
// want to be overwriting it.
if (pTib->pvArbitrary == NULL)
{
// Nothing's there. Set the name.
pTib->pvArbitrary = (void *)pszName;
}
What's there, of course, is definitely not "nothing". GetTIB
is defined as follows:
// A static function to get the TIB.
static P_T_TIB GetTIB()
{
P_T_TIB pTib = NULL;
_asm
{
MOV EAX, FS:[18h]
MOV pTib, EAX
}
return pTib;
}
This piece of highly suspicious assembly, as I learned, retrieves a pointer to the Thread Information Block of the current thread. T_TIB
's definition follows what's described on that page and allows the thread-naming function to store a pointer to the name into the Arbitrary Data Slot. This same technique was used in the name-getting function to retrieve the pointer and return the name.
Of course, the hitch is that "arbitrary" does not mean "user data", and NtMapViewOfSection
and friends rely on the field to store information, as evidenced by a data breakpoint placed on the field that was hit by LdrpMapViewOfSection
, among others. Somewhere along their path, encountering the field with an abnormally non-zero value (that is, our name pointer) caused them to perform some very wrong operation which ended up triggering that weird exception.
For the curious, I completely removed all of that, as it wasn't used anywhere else, and simply used a thread_local
variable to store the name. Case closed!