windowsnetworkingmemory-managementdriverndis

NDIS 6 Filter Driver with cloned Net Buffer List running extremely slowly


I am trying to learn about how to use NDIS filter drivers to intercept and edit network packets to and from a Windows machine. To that end I am writing an NDIS 6 filter driver that finds outgoing and incoming DNS packets using FilterSendNetBufferLists, FilterSendNetBufferListsComplete, FilterReceiveNetBufferLists, and FilterReturnNetBufferLists. All other NDIS functions are either set to NULL or I have left the template code unchanged.

In FilterSendNetBufferLists, I clone the NetBufferLists chain that is passed to the function from NDIS using NdisAllocateCloneNetBufferList, search the clone for some specific data, make a change to the data, then pass the clone down the network stack using NdisFSendNetBufferLists. In FilterSendNetBufferListsComplete, I free the clone using NdisFreeCloneNetBufferList. As far as I can tell, this process works as I intend with no obvious issues.

On the receive side, In FilterReceiveNetBufferLists, I do essentially the same thing. I clone the provided NetBufferLists, make changes, then pass the clone up the network stack using NdisFIndicateReceiveNetBufferLists. In FilterReturnNetBufferLists, I call NdisFReturnNetBufferLists to return the clone to NDIS. The changes I make to the packet make it to the application layer, but when the driver is installed and performing these operations, the internet on the target machine runs extremely slowly, to the point of being unusable. I can still ping websites using the command line, but I cannot get any websites to load via the browser. (I am only making changes to DNS packets for one specific website, so I am pretty certain that my changes are not the source of the issue.)

I have been working under the assumption that I need to free the memory associated with the receive clone, but no matter where I free that memory, I end up getting a system crash. Windbg tells me "An attempt was made to access a pageable (or completely invalid) address at an interrupt request level (IRQL) that is too high." The memory referenced is always NULL, and the stack text always has the following functions at the end:

ndis!ndisMSendCompleteNetBufferListsToOpen+0x6c ndis!ndisMSendCompleteNetBufferListsInternal+0x31e ndis!NdisMSendNetBufferListsComplete+0x24b I have tried freeing/returning the original NetBufferLists, but that always results in a system crash due to a double free error.

From what I can tell, I am not supposed to free any memory on the receive side, I am supposed to let NDIS handle that when the receive operation is complete. But if that is the case I have no idea what is bogging down my system when the driver is installed. Any advice on how to track this down or memory management in NDIS in general would be appreciated. Thanks!


Solution

  • In FilterSendNetBufferLists, I clone the NetBufferLists chain that is passed to the function from NDIS using NdisAllocateCloneNetBufferList, search the clone for some specific data, make a change to the data, then pass the clone down the network stack using NdisFSendNetBufferLists. In FilterSendNetBufferListsComplete, I free the clone using NdisFreeCloneNetBufferList.

    It's not clear from this description, so I'll just state something that possibly you already knew. You don't have to clone all the NBLs. You only have to clone the ones that you need to modify.

    I think the real problem here, though, is what "clone" really means. An NET_BUFFER_LIST has 4 bits of memory:

    NdisAllocateCloneNetBufferList will clone the first 2 always, and you decide whether it clones the 3rd based on whether you pass the NDIS_CLONE_FLAGS_USE_ORIGINAL_MDLS flag. The 4th is never cloned by NDIS, i.e., your shiny new NBL+NB and maybe MDL are still referring to the same old buffers from the original NBL.

    This means you can't free or complete the original NBL until the clone is freed. NDIS leaves some fields in the NBL to help you track this: the NET_BUFFER_LIST::ParentNetBufferList in the new clone NBL is yours to do whatever you want with; you can point it at the original NBL. (Likewise, you can use NET_BUFFER_LIST::ChildRefCount in the parent NBL if you expect to have multiple clones, but in this case it does not sound like you need that.)

    So at a high level, you'd do something like this:

    1. Your FilterSendNetBufferLists gets an NBL
    2. If the NBL is not interesting (not DNS or whatever), then just send it to NdisFSendNetBufferLists as-is. You're done.
    3. Otherwise, clone the NBL.
    4. Set the clone's ParentNetBufferList to point to the original NBL.
    5. Set the clone's SourceHandle to your FilterModuleHandle.
    6. Send the clone NBL. Do not send or complete or free the original NBL; leave it in limbo for now.

    Meanwhile, NDIS is also returning NBLs to your filter:

    1. Your FilterSendNetBufferListsComplete gets an NBL
    2. If the SourceHandle is not your FilterModuleHandle, then it's not one of your clones. Just pass it up to the next layer with NdisFSendNetBufferListsComplete. You're done.
    3. Otherwise, it's one of your clones. Find the original NBL with the clone's ParentNetBufferList pointer.
    4. Copy the NET_BUFFER_LIST::Status from the clone back to the parent.
    5. Free the clone.
    6. Complete the parent back to the higher layer with NdisFSendNetBufferListsComplete.

    Technically, if you've modified the payload buffer, you need to unmodify it before you complete the NBL. This rule is, uh, frequently broken, so I'll leave it to you to decide how strictly you want to adhere to the rules. Unmodifying the buffer would not cause a performance problem; it's usually just a problem of convenience. E.g. if you've overwritten the DNS server's IP address with your favorite DNS server's IP address, how do you get back the original bytes to restore? You can use the NBL context to remember them, but it's kind of inconvenient to do all this.

    On the receive side, In FilterReceiveNetBufferLists, I do essentially the same thing. I clone the provided NetBufferLists, make changes, then pass the clone up the network stack using NdisFIndicateReceiveNetBufferLists. In FilterReturnNetBufferLists, I call NdisFReturnNetBufferLists to return the clone to NDIS.

    This is incorrect. NBLs that came from NdisAllocateCloneNetBufferList must always be returned to NdisFreeCloneNetBufferList. (Unlike other operating systems, on NTOS you must always explicitly complete packet descriptors back to where they came from. There is no such thing as implicitly freeing the descriptor.)

    In general, I've described things as happening one NBL at a time, but to eke out the highest tier of performance, you need to batch NBLs. Or rather, when NDIS gives you a batch, you should avoid breaking things into 1-NBL-at-a-time. To be clear, even if you do things one NBL at time, this will not cause the level of performance problems that you've described, so this is not the root cause of your immediate problem. But if you're modifying only DNS traffic, it would be ideal if you can avoid un-batching all the non-DNS traffic on the system. We have a small library you are free to use to help with this: <ndl/nblclassify.h> is specifically meant to separate out "interesting" NBLs from the rest in a very efficient and batching-friendly fashion.

    Finally, as a general PSA, let me add that nearly 100% of newly-written filter drivers have this bug, so yours probably does too: your FilterReceiveNetBufferLists handler must special-case the NDIS_RECEIVE_FLAGS_RESOURCES flag. If this flag is set, then you must not call NdisFReturnNetBufferLists on those NBLs, and you must not use the NBLs after your FilterReceiveNetBufferLists handler has returned. As a rule of thumb for code review, it is always a bug if your filter driver does anything with the receive NBLs and yet does not handle NDIS_RECEIVE_FLAGS_RESOURCES (or the macro that wraps it, NDIS_TEST_RECEIVE_CANNOT_PEND).