unit-testinglinux-kernelx86-64kernel-modulethread-local-storage

Linking Linux kernel's object file built with `-mcmodel=kernel` into userspace application: what to do with GS segment used instead of FS?


I have an out-of-tree Linux kernel module (for Intel x86_64). Its source code files are compiled into .o object files, which are then linked into the kernel .ko loadable file using the standard documented process for such modules:

foo.c → foo.o → foo.ko // done by make -C <tree> M=<dir>

Now, I want to prepare and run unit tests for functions inside one of the source files of that module. To be able to run the tests fully in user space, I provide mock implementations for linking dependencies of that module (meaning that the test will not be calling privileged stuff such as CPL0 instructions, while in CPL3). I also link it with the test framework's entrypoint to produce a new executable binary that can then be run:

foo.o + test_mocks.c + test_entrypoint.c → foo_test // configured by CMake

Important: foo.o is reused as-is from the kernel module build step, it is the same object code that would go into the production.

However, the test application foo_test crashes early in the prologue of the first function invoked from foo.o. Debugging shows that it segfaults on instruction mov %gs:0x28,%rax. GS is not set (contains 0), so naturally there is no mapping for the destination.

Another instruction that gets inserted into the test binary closer to function's epilogue is sub %gs:0x28,%rdx. Both of these reminded me about thread-local storage, except that it should be using the FS segment instead of GS.

foo.o comes from the module build process, so it must be some kernel-specific backend option passed to the compiler that causes this.

By going through compiler options used by the kbuild process I have discovered that it is -mcmodel=kernel that causes GS to be used instead of FS.

GCC's documentation on this flag only mentions effects on address range but not on the segment's choice:

-mcmodel=kernel Generate code for the kernel code model. The kernel runs in the negative 2 GB of the address space. This model has to be used for Linux kernel code.

I have verified that removing this option when compiling a single .c file into the .o file indeed causes all %gs instances to be replaced by %fs in the object file.

How to allow functions from the module file to be called without crashing?

Some options that I have or am considering.

  1. Rebuilding foo.c into a new, "userspace-friendly", object file is troublesome because of other things kbuild does to the environment (-nostdinc, a bunch of -isystem, redefining true and false etc.). Besides, it would miss the point of testing the binary code that is actually used in the production.
  2. Initialize GS in the test's setup phase with current FS value, so that the mov %gs instructions won't crash? I am not planning to use thread-local storage in the tests, so I assume FS stays unmodified during the lifetime of the application. Can it be done entirely in userspace?
  3. Make the compiler/linker of the test binary to be aware that the code should use GS instead of FS? How?
  4. Using another testing framework that abstracts away some of the woes? I only looked at KUnit, and it seemed disproportionally complicated relatively to my goal. To configure, build and run another kernel just to be able to call a function from module is unreasonable; besides, how would I provide precise test doubles for my test scenarios (e.g., memory allocation failure, specific hardware errata behavior, specific scheduling sequence etc.)?

Solution

  • The mov and sub %gs:0x28,%rdx are from -fstack-protector-strong which is on by default, so the simplest and easiest thing to do would be to compile with -fno-stack-protector-strong.

    You could always have that file compiled with stack-protector disabled. But if you're modifying build options for unit-testing, -mcmodel=small-pie while keeping all of the other build options like -nostdinc would be the other way to solve your problem.

    Or compile with -S and use sed or something to replace %gs with %fs.


    For using the existing .ko, high 2G (mcmodel=kernel) code should be able to run in the low 2G (normal user-space with -fno-pie -no-pie). It might use sign-extended 32-bit absolute addresses (e.g. of arrays, for instructions like mov array(%rdi,,4), %eax); that will work correctly in the low or high 2G. It won't use mov $string_literal, %esi since that zero-extends to 64-bit. Code build for user-space -mcmodel=small -fno-pie won't work in the kernel because it will use that.

    For the GS vs. FS issue:

    Most code won't modify the FS (or GS) base after thread startup. Or after program startup if your code is single-threaded. In that case it should be _start or libc init code that actually sets the FS base.

    You don't want to set %gs's or %fs's value; the segment register value is just a 16-bit selector, usually the null selector (0) in 64-bit code. The 64-bit segment base address (fs.base) is set via an MSR in a system call, or wrfsbase. There won't even be a GDT or LDT entry with the segment base you want, so no value for mov %eax, %gs could work. (GDT/LDT entries only have room for a 32-bit segment base; AMD64 chose to use an alternate mechanism instead of extending the format, and 64-bit mode treats CS/DS/ES/SS bases as 0.)

    I think your kernel code will probably expect the same layout of things in memory at the GS base as user-space would expect at FS, so just setting GS.base = FS.base before any .ko code runs should do the trick (top of main, or in a wrapper function for a new thread).

    If you're on a new enough kernel and CPU for FSGSBASE instructions to work in user-space (Ivy Bridge, and kernel from the past few years):

    asm("rdfsbase %rax ; wrgsbase %rax" ::: "memory", "rax");
    

    Otherwise, the traditional way is to call arch_prctl with ARCH_GET_FS and ARCH_SET_GS. (https://man7.org/linux/man-pages/man2/arch_prctl.2.html).

    See also How to access segment register without linking libc.so? for some hand-written asm that calls arch_prctl with ARCH_SET_FS to set up FS appropriately for normal user-space code that wants to use %fs:0x28, but if you're making system calls you should just use C.


    If your main is part of the .ko, you'll need to avoid having GCC generate stack-cookie code in main's prologue before you could get GS set, unless you do something tricky like write an asm main that does it and tailcalls your real_main C function. If so, make sure main doesn't contain any variable-sized objects, or any arrays bigger than 16 bytes, or whatever the threshold is for stack-protector-strong. And make sure any functions under test don't inline into it if they have local arrays. (Hmm, but you wouldn't want to mark them __attribute__((noinline)) in general, only into main. Maybe __attribute__((optimize("O0"))) on main could work.

    But hopefully your unit-test main is compiled separately so it's not bloating the kernel binary.