Send doesn't work properly in my NDIS modifying filter driver

I'm trying to implement packet modifying filter by using NDIS. I used the approach with dropping packets and originating send/receive from cloned NBLs.

The docs on msdn say that's allowed: https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/ndis/nc-ndis-filter_send_net_buffer_lists

For each NET_BUFFER structure submitted to FilterSendNetBufferLists, a filter driver can do the following: ...

Copy the buffer and originate a send request with the copy. The send operation is similar to a filter-driver initiated send request. In this case, the driver must return the original buffer to the overlying driver by calling the NdisFSendNetBufferListsComplete function.

I implemented RX path successfully by using following algorithm:

Filter gets the NBL in FilterReceiveNetBufferLists
FilterReceiveNetBufferLists creates NBL clone and enqueues for further processing
FilterReceiveNetBufferLists calls NdisFReturnNetBufferLists
User connects to exposed device, dequeues packet and injects it again.
Device calls NdisFIndicateReceiveNetBufferLists RX path is fine and network works.

I did the same in TX path:

Filter gets the NBL in FilterSendNetBufferLists
FilterSendNetBufferLists creates NBL clone and enqueues for further processing
FilterSendNetBufferLists calls NdisFSendNetBufferListsComplete.
User connects to exposed device, dequeues packet and injects it again.
Device calls NdisFSendNetBufferLists

TX path doesn't work. I'm testing it by sending ICMP packets (just pinging DNS server IP). I have wireshark installed between router and my test machine. Wireshark captures the outgoing ICMP packets originated by TX path (step 5), however there are no response packets.

What exactly happens when I'm calling NdisFSendNetBufferListsComplete in my FilterSendNetBufferLists? Does TCP/IP driver get an information that packet has been transmitted without any errors?

Solution

Off-the-cuff, I'd guess that you're not calling NdisCopySendNetBufferListInfo in the TX path, which means that the checksum offload metadata is getting lost.

If the NIC claims to support checksum offload (i.e., the NIC hardware can insert IPv4, TCPv4, and/or TCPv6 checksums), then the TCPIP driver won't make any attempt to put a valid checksum into the IPv4/TCP headers. (Actually, it explicitly puts the partial checksum in there, which is easy to compute in software, and a bit more difficult to compute in hardware.) The TCPIP driver will then set some flags in the NBL's Info fields that instructs the hardware exactly how to insert the checksum into the packet payload.

When you clone the NBL, the clone doesn't, by default, inherit any of that metadata. So the cloned NBL has an incomplete checksum in the packet payload, yet is missing the instructions to the NIC hardware to insert the checksum.

The fix is simple: NdisCopySendNetBufferListInfo copies all the packet metadata that is pertinent to the TX path. (There's an analogous NdisCopyRecieveNetBufferListInfo` for the RX path, which you should also look into calling from your driver.) You should call one of these routines whenever you clone an NBL, and the clone will end up belonging to the same packet "flow" as the original NBL.

Why doesn't NDIS automatically copy the metadata when you call NdisAllocateCloneNetBufferList? The superficial problem is that NDIS doesn't know whether we're doing TX or RX path. But the deeper problem is that NDIS doesn't know how badly you plan to mangle the packets. For example, if your driver rewrites the TCP header on an RX packet, it may be inappropriate to just naively copy over the NIC's TCP checksum computation and RSS hash.

So calling NdisCopySendNetBufferListInfo effectively means you're claiming that you didn't mangle the packets so much that they would look different to any hardware offload. E.g., you didn't insert protocol headers, change TCP port numbers, etc. (If you are doing those things, then you either have to additionally write some code to smooth over the offloads, or disable them altogether.)

BTW, this is an interesting and subtle question, which everyone's intuition gets wrong:

Does TCP/IP driver get an information that packet has been transmitted without any errors?

Ndis[F|M]SendNetBufferListsComplete does not mean that the packet has been transmitted without any errors. It means exactly one thing: the packet payload, MDL(s), NB(s), & NBL are no longer in use, and the protocol driver can repurpose them.

When transmitting to typical PCIe hardware, that means that the DMA to the NIC's onboard RAM is completed, and the NIC promises not to touch the packet payload buffer anymore.

That is a simple answer, but it raises an immediate follow-up question: if SendComplete doesn't mean that the packet was transmitted successfully, how does the protocol figure out whether the packet was transmitted successfully?

The answer to that is that protocols don't care whether the packet was transmitted to the next hop. What they really care about is whether the distant endpoint got the packet. And the only way to find out is some sort of ACK system. So nobody really bothered to build a signal that says the NIC hardware has actually transmitted the NBL to the next hop, since the protocol couldn't do much with that information anyway.

(Packet timestamping (IEEE15888/PTP/NTP) is a bit of an exception to the above discussion. But even in that case, we don't actually want to know when the packet left the localhost. We actually want to know when the packet arrives at the remote endpoint. But the laws of physics being what they are, the latter is unknowable, so we have to settle for knowing when the TX packet departed the localhost.)

Note that if you are certain the packet did not transmit, then you can scribble an error code in NET_BUFFER_LIST::Status, and some protocols (e.g., UDP + winsock) will bubble that error up to the application. But in that case, you're just optimizing for a faster error path -- the application is still essentially obligated to build a network-level feedback mechanism (e.g. ACKs) to know if the packet got all the way to its destination.