cwindowsdriverndisirql

Access to global variable after calling NdisAcquireSpinLock causes IRQL_NOT_LESS_OR_EQUAL BSoD


I have a NDIS Filter driver (a update for WinPcap) and tested it on Windows 10 10586 x64 VM. I enabled the verifier and it causes IRQL_NOT_LESS_OR_EQUAL BSoD when launching Wireshark (aka using my driver's function).

Here's the dump:

1: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

IRQL_NOT_LESS_OR_EQUAL (a)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high.  This is usually
caused by drivers using improper addresses.
If a kernel debugger is available get the stack backtrace.
Arguments:
Arg1: fffff80137694a20, memory referenced
Arg2: 0000000000000002, IRQL
Arg3: 0000000000000008, bitfield :
    bit 0 : value 0 = read operation, 1 = write operation
    bit 3 : value 0 = not an execute operation, 1 = execute operation (only on chips which support this level of status)
Arg4: fffff80137694a20, address which referenced memory

Debugging Details:
------------------

***** Debugger could not find nt in module list, module list might be corrupt, error 0x80070057.


DUMP_CLASS: 1

DUMP_QUALIFIER: 400

BUILD_VERSION_STRING:  10586.103.amd64fre.th2_release.160126-1819

SYSTEM_MANUFACTURER:  VMware, Inc.

VIRTUAL_MACHINE:  VMware

SYSTEM_PRODUCT_NAME:  VMware Virtual Platform

SYSTEM_VERSION:  None

BIOS_VENDOR:  Phoenix Technologies LTD

BIOS_VERSION:  6.00

BIOS_DATE:  07/02/2015

BASEBOARD_MANUFACTURER:  Intel Corporation

BASEBOARD_PRODUCT:  440BX Desktop Reference Platform

BASEBOARD_VERSION:  None

DUMP_TYPE:  2

BUGCHECK_P1: fffff80137694a20

BUGCHECK_P2: 2

BUGCHECK_P3: 8

BUGCHECK_P4: fffff80137694a20

READ_ADDRESS: unable to get nt!MiSessionIdBitmap
Unable to get value of nt!MiSessionWsList
 fffff80137694a20 

CURRENT_IRQL:  0

FAULTING_IP: 
+0
fffff801`37694a20 4883ec08        sub     rsp,8

CPU_COUNT: 2

CPU_MHZ: 961

CPU_VENDOR:  GenuineIntel

CPU_FAMILY: 6

CPU_MODEL: 3c

CPU_STEPPING: 3

CPU_MICROCODE: 0,0,0,0 (F,M,S,R)  SIG: 1E'00000000 (cache) 0'00000000 (init)

CUSTOMER_CRASH_COUNT:  1

DEFAULT_BUCKET_ID:  CORRUPT_MODULELIST_AV

BUGCHECK_STR:  AV

ANALYSIS_SESSION_HOST:  AKISN0W-PC

ANALYSIS_SESSION_TIME:  03-18-2016 09:48:01.0434

ANALYSIS_VERSION: 10.0.10586.567 amd64fre

LAST_CONTROL_TRANSFER:  from fffff801373c7fe9 to fffff801373bd480

FAILED_INSTRUCTION_ADDRESS: 
+0
fffff801`37694a20 4883ec08        sub     rsp,8

SYMBOL_ON_RAW_STACK:  1

STACK_ADDR_RAW_STACK_SYMBOL: ffffd0012ba372e8

STACK_COMMAND:  dps ffffd0012ba372e8-0x20 ; kb

STACK_TEXT:  
ffffd001`2ba372c8  fffff801`376876d6
ffffd001`2ba372d0  fffff801`3792eebe
ffffd001`2ba372d8  fffff801`372ef2c2
ffffd001`2ba372e0  fffff800`71272b02 npf!NPF_GetCopyFromOpenArray+0x22 [j:\npcap\packetwin7\npf\npf\openclos.c @ 1084]
ffffd001`2ba372e8  fffff800`71272ec5 npf!NPF_OpenAdapter+0x2d [j:\npcap\packetwin7\npf\npf\openclos.c @ 258]
ffffd001`2ba372f0  00000000`00000000
ffffd001`2ba372f8  00000000`00000000
ffffd001`2ba37300  00000000`00000000
ffffd001`2ba37308  00000000`00000000
ffffd001`2ba37310  fffff801`37694a20
ffffd001`2ba37318  fffff800`71272ec5 npf!NPF_OpenAdapter+0x2d [j:\npcap\packetwin7\npf\npf\openclos.c @ 258]
ffffd001`2ba37320  fffff801`3792eebe
ffffd001`2ba37328  fffff801`372ef2c2
ffffd001`2ba37330  fffff801`37690d68
ffffd001`2ba37338  fffff801`376876d6
ffffd001`2ba37340  fffff801`376860dc


FOLLOWUP_IP: 
npf!NPF_GetCopyFromOpenArray+22 [j:\npcap\packetwin7\npf\npf\openclos.c @ 1084]
fffff800`71272b02 488b1d177e0000  mov     rbx,qword ptr [npf!g_arrOpen (fffff800`7127a920)]

FAULT_INSTR_CODE:  171d8b48

FAULTING_SOURCE_LINE:  j:\npcap\packetwin7\npf\npf\openclos.c

FAULTING_SOURCE_FILE:  j:\npcap\packetwin7\npf\npf\openclos.c

FAULTING_SOURCE_LINE_NUMBER:  1084

FAULTING_SOURCE_CODE:  
  1080:     POPEN_INSTANCE CurOpen;
  1081:     TRACE_ENTER();
  1082: 
  1083:     NdisAcquireSpinLock(&g_OpenArrayLock);
> 1084:     for (CurOpen = g_arrOpen; CurOpen != NULL; CurOpen = CurOpen->Next)
  1085:     {
  1086:         if (CurOpen->AdapterBindingStatus == ADAPTER_BOUND && NPF_EqualAdapterName(&CurOpen->AdapterName, pAdapterName) == TRUE)
  1087:         {
  1088:             NdisReleaseSpinLock(&g_OpenArrayLock);
  1089:             return NPF_DuplicateOpenObject(CurOpen, DeviceExtension);


SYMBOL_NAME:  npf!NPF_GetCopyFromOpenArray+22

FOLLOWUP_NAME:  MachineOwner

DEBUG_FLR_IMAGE_TIMESTAMP:  0

IMAGE_VERSION:  0.6.0.301

MODULE_NAME: Unknown_Module

IMAGE_NAME:  Unknown_Image

BUCKET_ID:  CORRUPT_MODULELIST_AV

PRIMARY_PROBLEM_CLASS:  CORRUPT_MODULELIST

FAILURE_BUCKET_ID:  CORRUPT_MODULELIST_AV

TARGET_TIME:  2016-03-18T01:43:34.000Z

OSBUILD:  10586

OSSERVICEPACK:  0

SERVICEPACK_NUMBER: 0

OS_REVISION: 0

SUITE_MASK:  272

PRODUCT_TYPE:  1

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

OSEDITION:  Windows 10 WinNt TerminalServer SingleUserTS

OS_LOCALE:  

USER_LCID:  0

OSBUILD_TIMESTAMP:  unknown_date

BUILDDATESTAMP_STR:  160126-1819

BUILDLAB_STR:  th2_release

BUILDOSVER_STR:  10.0.10586.103.amd64fre.th2_release.160126-1819

ANALYSIS_SESSION_ELAPSED_TIME: 18d9

ANALYSIS_SOURCE:  KM

FAILURE_ID_HASH_STRING:  km:corrupt_modulelist_av

FAILURE_ID_HASH:  {fc259191-ef0c-6215-476f-d32e5dcaf1b7}

Followup:     MachineOwner
---------

The faulty source code is here: https://github.com/nmap/npcap/blob/master/packetWin7/npf/npf/Openclos.c

I know NdisAcquireSpinLock call will raise IRQL to Dispatch_LEVEL. And WinDbg seems to say that g_arrOpen is in a pageable memory which is not allowed to be accessed in Dispatch_LEVEL. However, the truth is, g_arrOpen is a global variable pointed to the OPEN_INSTANCE struct. OPEN_INSTANCE instances are allocated in Non-paged pool. A global variable coexists with the driver image, so it can't be paged out either.

So I don't what's wrong here? Any help? Thanks!


Solution

  • The global variable isn't the problem. Firstly, note that Arg3 has the execute bit set, i.e., the paged out memory is code, not data. You can confirm this by noting that READ_ADDRESS and FAULTING_IP are the same.

    So, let's look at that code more closely:

    > 1084:     for (CurOpen = g_arrOpen; CurOpen != NULL; CurOpen = CurOpen->Next)
      1085:     {
      1086:         if (CurOpen->AdapterBindingStatus == ADAPTER_BOUND && NPF_EqualAdapterName(&CurOpen->AdapterName, pAdapterName) == TRUE)
    

    This is a release build, so you can't take the indicated line too seriously; the problem is however likely to be nearby. A page fault for executable data suggests a bad function call, so let's start by looking at NPF_EqualAdapterName:

    BOOLEAN
    NPF_EqualAdapterName(
        PNDIS_STRING s1,
        PNDIS_STRING s2
        )
    {
        // return RtlEqualMemory(s1->Buffer, s2->Buffer, s2->Length);
        // We use RtlEqualUnicodeString because it's case-insensitive. However, verifier will complain about this call because it's under DISPATCH_LEVEL.
        // Just don't enable the IRQL switch when testing with verifier.
        return RtlEqualUnicodeString(s1, s2, TRUE);
    }
    

    A very short function, so almost certainly inlined, so it wouldn't necessarily turn up in the stack trace. That leads us to the call to RtlEqualUnicodeString which when we check the documentation turns out to require PASSIVE_LEVEL. Bingo. (Heck, we didn't even need to look at the documentation except for verification, since the comments outright state that the call is illegal.)

    Conclusion: RtlEqualUnicodeString happened to be paged out at the time you called it.

    (At a guess, the best solution would be to go back to using RtlEqualMemory and make sure that your comparison string is properly cased ahead of time.)