I'm trying to send a struct from user-space to my module in kernel space using netlink, my struct in the user-space is:
struct test{
unsigned int length;
char name[MAX_NAME_LENGTH];
};
and in the kernel space is:
struct test{
__u32 length;
char name[MAX_NAME_LENGTH];
};
where MAX_NAME_LENGTH
is a macro defined to be equal 50.
In the user-space, I've the function main which send my struct to the kernel with the following code:
int main(){
struct iovec iov[2];
int sock_fd;
struct sockaddr_nl src_add;
struct sockaddr_nl dest_add;
struct nlmsghdr * nl_hdr = NULL;
struct msghdr msg;
struct test message;
memset(&message, 0, sizeof(struct test));
message.length = 18;
strcpy(message.name, "Just a test\0");
sock_fd = socket(PF_NETLINK, SOCK_RAW, NETLINK_USER);
if (sock_fd < 0){
printf("Netlink socket creation failed\n");
return -1;
}
memset(&src_add, 0, sizeof(src_add));
src_add.nl_family = AF_NETLINK;
src_add.nl_pid = getpid();
memset(&dest_add, 0, sizeof(dest_add));
dest_add.nl_family = AF_NETLINK;
dest_add.nl_pid = 0; // Send to linux kernel
dest_add.nl_groups = 0; // Unicast
bind(sock_fd,(struct sockaddr *)&src_add,sizeof(src_add));
nl_hdr = (struct nlmsghdr *) malloc(NLMSG_SPACE(sizeof(struct test)));
memset(nl_hdr, 0, NLMSG_SPACE(sizeof (struct test)));
nl_hdr->nlmsg_len = NLMSG_SPACE(sizeof(struct test));
nl_hdr->nlmsg_pid = getpid();
nl_hdr->nlmsg_flags = 0;
iov[0].iov_base = (void *)nl_hdr;
iov[0].iov_len = nl_hdr->nlmsg_len;
iov[1].iov_base = &message;
iov[1].iov_len = sizeof(struct test);
memset(&msg,0, sizeof(msg));
msg.msg_name = (void *)&dest_add;
msg.msg_namelen = sizeof(dest_add);
msg.msg_iov = &iov[0];
msg.msg_iovlen = 2;
sendmsg(sock_fd,&msg,0);
close(sock_fd);
return 0;
}
And in the kernel side I've registered a function called callback to be called every time that a message is received, this is the callback function:
static void callback(struct sk_buff *skb){
struct nlmsghdr *nl_hdr;
struct test * msg_rcv;
nl_hdr = (struct nlmsghdr*)skb->data;
msg_rcv = (struct test*) nlmsg_data(nl_hdr);
printk(KERN_INFO "Priting the length and name in the struct:%u, %s\n",msg_rcv->length, msg_rcv->name);
}
When I run these codes and see the dmesg output I receive the following message: Priting the length and name in the struct:0,
, so why the fields of the struct filled in the user-space side aren't being sent to the kernel?
Btw, NETLINK_USER
is defined as 31.
I'm going to first explain the one superfluous issue that prevents your code from doing what you want, then explain why what you want is a bad idea, then explain the right solution.
You "want" to send a packet consisting of a netlink header followed by a struct. In other words, this:
+-----------------+-------------+
| struct nlmsghdr | struct test |
| (16 bytes) | (54 bytes) |
+-----------------+-------------+
The problem is that's not what you're telling your iovec. According to your iovec code, the packet looks like this:
+-----------------+--------------+-------------+
| struct nlmsghdr | struct test | struct test |
| (16 bytes) | (54 bytes) | (54 bytes) |
| (data) | (all zeroes) | (data) |
+-----------------+--------------+-------------+
This line:
iov[0].iov_len = nl_hdr->nlmsg_len;
Should be this:
iov[0].iov_len = NLMSG_HDRLEN;
Because your first iovec slot is just the Netlink header; not the whole packet.
C has a gotcha called "data structure padding." Don't skip this lecture; I'd argue that anyone who deals with the C language MUST read it ASAP: http://www.catb.org/esr/structure-packing/
The gist of it is that C compilers are allowed to introduce garbage between the members of any structure. Thus, when you declare this:
struct test {
unsigned int length;
char name[MAX_NAME_LENGTH];
};
The compiler is technically allowed to mutate that during implementation into something like
struct test {
unsigned int length;
unsigned char garbage[4];
char name[MAX_NAME_LENGTH];
};
See the problem? If your kernel module and your userspace client were generated by different compilers, or by the same compiler but with slightly different flags, or even by slightly different versions of the same compiler, the structures might differ and the kernel will receive garbage, no matter how correct your code looks.
Update: Someone asked me to elaborate on that, so here it goes:
Suppose you have the following structure:
struct example {
__u8 value8;
__u16 value16;
};
In userspace, the compiler decides to leave it as is. However, in kernelspace the compiler "randomly" decides to convert it to:
struct example {
__u8 value8;
__u8 garbage;
__u16 value16;
};
In your userspace client, you then write this code:
struct example x;
x.value8 = 0x01;
x.value16 = 0x0203;
In memory, the structure will look like this:
01 <- value8
02 <- First byte of value16
03 <- Second byte of value16
When you send that to the kernel, the kernel will, of course, receive the same thing:
01
02
03
But it will interpret it differently:
01 <- value8
02 <- garbage
03 <- First byte of value16
junk <- Second byte of value16
(End of Update)
In your case the problem is aggravated by the fact that you define test.length
as unsigned int
in userspace, yet for some reason you change it into __u32
in kernelspace. Your code is problematic even before structure padding; if your userspace defines basic integers as 64-bit, the bug will also inevitably trigger.
And there's another problem: "Btw, NETLINK_USER
is defined as 31" tells me you're following tutorials or code samples long obsolete or written by people who don't know what they are doing. Do you know where that 31 comes from? It's the identifier of your "Netlink family." They define it as 31 because that's the highest possible value it can have (0-31), and therefore, it's the most unlikely one to collide with other Netlink families defined by the kernel. (Because they are numbered monotonically.) But most careless Netlink users are following the tutorials, and therefore most of their Netlink families identify as 31. Therefore, your kernel module will be unable to coexist with any of them. netlink_kernel_create()
will kick you out because 31 is already claimed.
And you might be wondering, "well shit. There are only 32 available slots, 23 of them are already taken by the kernel and there's an unknown but likely large number of additional people wanting to register different Netlink families. What do I do?!"
It's 2020. We don't use Netlink anymore. We use better-Netlink: Generic Netlink.
Generic Netlink uses strings and dynamic integers as family identifiers, and drives you to use Netlink's "attribute" framework by default. (The latter encourages you to serialize and deserialize structures in a portable way, which is the real solution to your original problem.)
This code needs to be visible to both your userspace client and kernel module:
#define SAMPLE_FAMILY "Sample Family"
enum sample_operations {
SO_TEST, /* from your "struct test" */
/* List more here for different request types. */
};
enum sample_attribute_ids {
/* Numbering must start from 1 */
SAI_LENGTH = 1, /* From your test.length */
SAI_NAME, /* From your test.name */
/* This is a special one; don't list any more after this. */
SAI_COUNT,
#define SAI_MAX (SAI_COUNT - 1)
};
This is the kernel module:
#include <linux/module.h>
#include <linux/version.h>
#include <net/genetlink.h>
#include "../include/protocol.h"
/*
* A "policy" is a bunch of rules. The kernel will validate the request's fields
* match these data types (and other defined constraints) for us.
*/
struct nla_policy const sample_policy[SAI_COUNT] = {
[SAI_LENGTH] = { .type = NLA_U32 },
[SAI_NAME] = { .type = NLA_STRING },
};
/*
* This is the function the kernel calls whenever the client sends SO_TEST
* requests.
*/
static int handle_test_operation(struct sk_buff *skb, struct genl_info *info)
{
if (!info->attrs[SAI_LENGTH]) {
pr_err("Invalid request: Missing length attribute.\n");
return -EINVAL;
}
if (!info->attrs[SAI_NAME]) {
pr_err("Invalid request: Missing name attribute.\n");
return -EINVAL;
}
pr_info("Printing the length and name: %u, '%s'\n",
nla_get_u32(info->attrs[SAI_LENGTH]),
(unsigned char *)nla_data(info->attrs[SAI_NAME]));
return 0;
}
static const struct genl_ops ops[] = {
/*
* This is what tells the kernel to use the function above whenever
* userspace sends SO_TEST requests.
* Add more array entries if you define more sample_operations.
*/
{
.cmd = SO_TEST,
.doit = handle_test_operation,
#if LINUX_VERSION_CODE < KERNEL_VERSION(5, 2, 0)
/* Before kernel 5.2, each op had its own policy. */
.policy = sample_policy,
#endif
},
};
/* Descriptor of our Generic Netlink family */
static struct genl_family sample_family = {
.name = SAMPLE_FAMILY,
.version = 1,
.maxattr = SAI_MAX,
#if LINUX_VERSION_CODE >= KERNEL_VERSION(5, 2, 0)
/* Since kernel 5.2, the policy is family-wide. */
.policy = sample_policy,
#endif
.module = THIS_MODULE,
.ops = ops,
.n_ops = ARRAY_SIZE(ops),
};
/* Called by the kernel when the kernel module is inserted */
static int test_init(void)
{
return genl_register_family(&sample_family);
}
/* Called by the kernel when the kernel module is removed */
static void test_exit(void)
{
genl_unregister_family(&sample_family);
}
module_init(test_init);
module_exit(test_exit);
And here's the userspace client (You need to install libnl-genl-3 --sudo apt install libnl-genl-3-dev
on Debian/Ubuntu):
#include <errno.h>
#include <netlink/genl/ctrl.h>
#include <netlink/genl/genl.h>
#include "../include/protocol.h"
static struct nl_sock *sk;
static int genl_family;
static void prepare_socket(void)
{
sk = nl_socket_alloc();
genl_connect(sk);
genl_family = genl_ctrl_resolve(sk, SAMPLE_FAMILY);
}
static struct nl_msg *prepare_message(void)
{
struct nl_msg *msg;
msg = nlmsg_alloc();
genlmsg_put(msg, NL_AUTO_PORT, NL_AUTO_SEQ, genl_family, 0, 0, SO_TEST, 1);
/*
* The nla_put* functions ensure that your data will be stored in a
* portable way.
*/
nla_put_u32(msg, SAI_LENGTH, 18);
nla_put_string(msg, SAI_NAME, "Just a test");
return msg;
}
int main(int argc, char **argv)
{
struct nl_msg *msg;
prepare_socket();
msg = prepare_message();
nl_send_auto(sk, msg); /* Send message */
nlmsg_free(msg);
nl_socket_free(sk);
return 0;
}
This code should work starting from kernel 4.10. (I tested it in 4.15.) The kernel API was somewhat different before that.
I left a pocket version of my test environment (with makefiles and proper error handling and everything) in my Dropbox, so you can run it easily.