linuxgonetlinkaudit-trail

Audit netlink response don't have the right packet length


I have been trying to read the linux audit logs from go using mdlayher/netlink. I am able to make a connection and set the PID as well to be able to receive logs from the netlink socket over unicast and multicast.

The problem is, when the library tries to parse the messages from netlink, it fails, and not because of the library. I tried to dump the messages that were sent to my connection and this is what I found.

([]uint8) (len=48 cap=4096) {
 00000000  1d 00 00 00 28 05 00 00  00 00 00 00 00 00 00 00  |....(...........|
 00000010  61 75 64 69 74 28 31 36  31 32 30 33 31 38 36 32  |audit(1612031862|
 00000020  2e 36 33 31 3a 32 37 31  30 34 29 3a 20 00 00 00  |.631:27104): ...|
}

This is one of the messages from the log stream. According to the packet structure, The first 16 bytes are part of the netlink message header nlmsghdr.

struct nlmsghdr {
    __u32       nlmsg_len;  /* Length of message including header */
    __u16       nlmsg_type; /* Message content */
    __u16       nlmsg_flags;    /* Additional flags */
    __u32       nlmsg_seq;  /* Sequence number */
    __u32       nlmsg_pid;  /* Sending process port ID */
};

Note, how the nlmsg_len is marked as length of the message including header. If you look at the message dump, the first __u32 is 1d 00 00 00, which in the network byte order is 29. Which means the whole packet should be 29 bytes. But, if you count the bytes, it's 45 + 3 bytes of padding which is how the packets are aligned. The message ends at byte 45 which is (29 + 16) or 29 + length of the message header size.

But, the strange thing is, it only happens on audit log messages and not on audit control message replies. Refer to https://play.golang.com/p/mA7_MJdVSv8 for examples on how the packet structures are parsed.

Is this expected ? Looking at the go stdlib syscall.ParseNetlinkMessage, it seems to obey the header + body constraint. I can't make it out in the userspace-audit code which is responsible for auditd, auditctl and it's family of tools.

Another popular library, slackhq/go-audit seems to not rely on the header length and parse based on the size of the buffer read from the socket.

This diff on the mdlayher/netlink library seems to fix the above issue and can also get the byte dumps of the payload. But this shouldn't be the case.

diff --git a/conn_linux.go b/conn_linux.go
index ef18ef7..561ac69 100644
--- a/conn_linux.go
+++ b/conn_linux.go
@@ -11,6 +11,8 @@ import (
        "time"
        "unsafe"

+       "github.com/davecgh/go-spew/spew"
+       "github.com/josharian/native"
        "golang.org/x/net/bpf"
        "golang.org/x/sys/unix"
 )
@@ -194,7 +196,13 @@ func (c *conn) Receive() ([]Message, error) {

        raw, err := syscall.ParseNetlinkMessage(b[:n])
        if err != nil {
-               return nil, err
+               spew.Dump(b[:n])
+               bl := native.Endian.Uint32(b[:4]) + syscall.NLMSG_HDRLEN
+               native.Endian.PutUint32(b[:4], bl)
+               raw, err = syscall.ParseNetlinkMessage(b[:n])
+               if err != nil {
+                       return nil, err
+               }
        }

        msgs := make([]Message, 0, len(raw))

code for reproducing the above behaviour


Update 1

This above behavior seems to crop up when I try to alter the audit state via the AUDIT_SET message type. If I try to connect to the read-only multicast group AUDIT_NLGRP_READLOG, It doesn't seem to happen. Also, If I close the unicast connection and then try multicast, the issue is back again. Basically, as long as my PID is bound to the socket, this issue comes back up. Sample dump when only connecting via multicast group

([]uint8) (len=76 cap=4096) {
 00000000  49 00 00 00 1d 05 00 00  00 00 00 00 00 00 00 00  |I...............|
 00000010  61 75 64 69 74 28 31 36  31 32 31 36 35 31 33 31  |audit(1612165131|
 00000020  2e 30 31 36 3a 33 33 34  37 32 29 3a 20 61 72 67  |.016:33472): arg|
 00000030  63 3d 32 20 61 30 3d 22  61 75 64 69 74 63 74 6c  |c=2 a0="auditctl|
 00000040  22 20 61 31 3d 22 2d 73  22 00 00 00              |" a1="-s"...|
}

Notice how the size if 0x49 = 73, the exact packet length.


Solution

  • so I don't know anything personally about this, but I found this question via a search, and one other result to that search was this blog article: https://blog.des.no/2020/08/netlink-auditing-and-counting-bytes/

    To summarize, someone else found that behavior too, and it seems to be a bug. Here's the relevant quote:

    Bug #3: The length field on audit data messages does not include the length of the header.

    This is jaw-dropping. It is so fundamentally wrong. It means that anyone who wants to talk to the audit subsystem using their own code instead of libaudit will have to add a workaround to the Netlink layer of their stack to either fix or ignore the error, and apply that workaround only for certain message types.

    How has this gone unnoticed? Well, libaudit doesn’t do much input validation.

    (...)

    The odds of these bugs getting fixed is approximately zero, because existing applications will break in interesting ways if the kernel starts setting the length field correctly.