cnetlink

Error trying to send an abstract data through libnl and generic netlink


I'm trying to send an abstract data using libnl and generic netlink, when I run the following code:

struct nl_msg *msg;
struct nl_data *abstract;
int err = -1;

if ((msg = nlmsg_alloc()) == NULL)
    return err;

if ((genlmsg_put(msg, ctrl->pid, NL_AUTO_SEQ, ctrl->genl_family, 
        0, NLM_F_REQUEST, CREATE_STATE, KLUA_VERSION) == NULL))
    return err;

if ((err = nla_put_string(msg, STATE_NAME, cmd->name)))
    return err;

if ((err = nl_send_auto(ctrl->sock, msg)) < 0)
    return err;

nlmsg_free(msg);

The kernel receives the message well. But if I change this code for this:

struct nl_msg *msg;
struct nl_data *abstract;
int err = -1;

if ((msg = nlmsg_alloc()) == NULL)
    return err;

if ((abstract = nl_data_alloc(cmd, sizeof(struct klua_nl_state))) == NULL)
    return err;

if ((genlmsg_put(msg, ctrl->pid, NL_AUTO_SEQ, ctrl->genl_family, 
        0, NLM_F_REQUEST, CREATE_STATE, KLUA_VERSION) == NULL))
    return err;
    
nla_put_data(msg, TEST_ATTR, abstract);

if ((err = nl_send_auto(ctrl->sock, msg)) < 0)
    return err;

nlmsg_free(msg);

By the way, my TEST_ATTR is defined as:

[TEST_ATTR] = {.type = NLA_UNSPEC}

Why the kernel isn't receiving my message if I'm changing just the payload of the message? How do I do to send abstract data through generic netlink and libnl?


Solution

  • Since Linux 5.2, the kernel's attribute validator function (validate_nla()) contains a conditional that essentially prohibits NLA_UNSPEC from being used.

    I'm not really sure if disabling that validation is possible. The main user of validate_nla() hardcodes validate as NL_VALIDATE_STRICT, which contains NL_VALIDATE_UNSPEC.

    But regardless, I suggest that you abstain from using NLA_UNSPEC to send C structs without proper serialization. It's a disaster waiting to happen. C has a gotcha called "data structure padding;" The gist of it is that C compilers are allowed to introduce garbage between the members of any structure. Thus, when you declare this:

    struct test {
        __u16 a;
        __u32 b;
    };
    

    The compiler is technically allowed to mutate that during implementation into something like

    struct test {
        __u16 a;
        unsigned char garbage[2];
        __u32 b;
    };
    

    See the problem? If your kernel module and your userspace client were generated by different compilers, or by the same compiler but with slightly different flags, or even by slightly different versions of the same compiler, the structures might differ and the kernel will receive garbage, no matter how correct your code looks.

    Use Nested Attributes instead of NLA_UNSPEC. They will take care of alignment for you.