clinuxio-uring

io_uring user_data field is always zero


I'm playing around with io_uring, https://kernel.dk/io_uring.pdf, to see if it can be used for async file I/O for logging. This is a simple program that opens a file, stats the file, and then reads the first 4k from the file. This program runs to completion successfully when the file exists and is readable. But the user_data field in the completion queue entry is always zero. The documentation for io_uring says:

user_data is common across op-codes, and is untouched by the kernel. It's simply copied to the completion event, cqe, when a completion event is posted for this request.

Since the completions are not ordered the user_data field is needed to match completions with submissions. If the field is always zero then how can it be used?

#include <iostream>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <sys/stat.h>
#include <liburing.h>
#include <stdlib.h>

int main() {
  struct io_uring ring;
  // see man io_uring_setup for what this does
  auto ret = io_uring_queue_init(64, &ring, 0);

  if (ret) {
    perror("Failed initialize uring.");
    exit(1);
  }

  std::cout << "I/O uring initialized successfully. " << std::endl;

  auto directory_fd = open("/tmp", O_RDONLY);
  if (directory_fd < 0) {
    perror("Failed to open current directory.");
    exit(1);
  }

  struct io_uring_sqe *submission_queue_entry = io_uring_get_sqe(&ring);
  submission_queue_entry->user_data = 100;
  io_uring_prep_openat(submission_queue_entry, directory_fd, "stuff", O_RDONLY, 0);


  submission_queue_entry = io_uring_get_sqe(&ring);
  submission_queue_entry->user_data = 1000;
  struct statx statx_info;
  io_uring_prep_statx(submission_queue_entry, directory_fd, "stuff", 0, STATX_SIZE, &statx_info);

  //TODO: what does this actually return?
  auto submit_error = io_uring_submit(&ring);
  if (submit_error != 2) {
    std::cerr << strerror(submit_error) << std::endl;
    exit(2);
  }

  int file_fd = -1;
  uint32_t responses = 0;
  while (responses != 2) {
    struct io_uring_cqe *completion_queue_entry = 0;
    auto wait_return = io_uring_wait_cqe(&ring, &completion_queue_entry);
    if (wait_return) {
      std::cerr << "Completion queue wait error. " << std::endl;
      exit(2);
    }

    std::cout << "user data " << completion_queue_entry->user_data << " entry ptr " << completion_queue_entry << " ret " << completion_queue_entry->res << std::endl;
    std::cout << "size " << statx_info.stx_size << std::endl;
    io_uring_cqe_seen(&ring, completion_queue_entry);
    if (completion_queue_entry->res > 0) {
      file_fd = completion_queue_entry->res;
    }
    responses++;
  }


  submission_queue_entry = io_uring_get_sqe(&ring);
  submission_queue_entry->user_data = 66666;
  char buf[1024 * 4];
  io_uring_prep_read(submission_queue_entry, file_fd, buf,  1024 * 4,  0);
  io_uring_submit(&ring);
  struct io_uring_cqe* read_entry = 0;
  auto read_wait_rv = io_uring_wait_cqe(&ring, &read_entry);
  if (read_wait_rv) {
    std::cerr << "Error waiting for read to complete." << std::endl;
    exit(2);
  }
  std::cout << "Read user data " << read_entry->user_data << " completed with " << read_entry->res << std::endl;
  if (read_entry->res < 0) {
    std::cout << "Read error " << strerror(-read_entry->res) << std::endl;
  }
}

Output

I/O uring initialized successfully.
user data 0 entry ptr 0x7f4e3158c140 ret 5
size 1048576
user data 0 entry ptr 0x7f4e3158c150 ret 0
size 1048576
Read user data 0 completed with 4096


Solution

  • What happens if you try and set user_data after your calls to io_uring_prep_openat()/io_uring_prep_statx()?

    I ask this because doing a Google search for io_uring_prep_statx suggests it comes from liburing library.

    Searching the liburing source for io_uring_prep_openat leads us to a definition of io_uring_prep_openat() in liburing.h:

    static inline void io_uring_prep_openat(struct io_uring_sqe *sqe, int dfd,
                        const char *path, int flags, mode_t mode)
    {
        io_uring_prep_rw(IORING_OP_OPENAT, sqe, dfd, path, mode, 0);
        sqe->open_flags = flags;
    }
    

    Searching the liburing source for io_uring_prep_statx leads to a definition of io_uring_prep_statx():

    static inline void io_uring_prep_statx(struct io_uring_sqe *sqe, int dfd,
                    const char *path, int flags, unsigned mask,
                    struct statx *statxbuf)
    {
        io_uring_prep_rw(IORING_OP_STATX, sqe, dfd, path, mask,
                    (__u64) (unsigned long) statxbuf);
        sqe->statx_flags = flags;
    }
    

    Chasing the calls gets us to the definition of io_uring_prep_rw:

    static inline void io_uring_prep_rw(int op, struct io_uring_sqe *sqe, int fd,
                        const void *addr, unsigned len,
                        __u64 offset)
    {
        sqe->opcode = op;
        sqe->flags = 0;
        sqe->ioprio = 0;
        sqe->fd = fd;
        sqe->off = offset;
        sqe->addr = (unsigned long) addr;
        sqe->len = len;
        sqe->rw_flags = 0;
        sqe->user_data = 0;
        sqe->__pad2[0] = sqe->__pad2[1] = sqe->__pad2[2] = 0;
    }
    

    PS: I notice you have a comment that says

      //TODO: what does this actually return?
      auto submit_error = io_uring_submit(&ring);
    

    Well, if we search the liburing repo for "int io_uring_submit" we come across the following in src/queue.c:

    /*
     * Submit sqes acquired from io_uring_get_sqe() to the kernel.
     *
     * Returns number of sqes submitted
     */
    int io_uring_submit(struct io_uring *ring)
    

    This ultimately chains calls down to io_uring_enter() syscall (raw man page) so you can read that for more detail.

    Update: The questioner says moving the assignment solved their problem so I invested some time thinking about the text they quoted. Upon further reading I have picked up a subtlety (emphasis added):

    user_data is common across op-codes, and is untouched by the kernel. It's simply copied to the completion event, cqe, when a completion event is posted for this request.

    There's a similar statement earlier in the document (again emphasis added):

    The cqe contains a user_data field. This field is carried from the initial request submission, and can contain any information that the the application needs to identify said request. One common use case is to have it be the pointer of the original request. The kernel will not touch this field, it's simply carried straight from submission to completion event.

    The statement applies to io_uring kernel syscalls but io_uring_prep_openat() / io_uring_prep_statx() are liburing functions. liburing is a userspace helper library so the statements above about user_data do not have to apply to all liburing functions.

    If the field is always zero then how can it be used?

    The field is being zeroed by certain liburing preparation helper functions. In this case it can be only be set (and retain the new value) after those helper function have been called. The io_uring kernel syscalls behave per the quote.