I have an out-of-tree Linux kernel module (for Intel x86_64). Its source code files are compiled into .o
object files, which are then linked into the kernel .ko
loadable file using the standard documented process for such modules:
foo.c → foo.o → foo.ko // done by make -C <tree> M=<dir>
Now, I want to prepare and run unit tests for functions inside one of the source files of that module. To be able to run the tests fully in user space, I provide mock implementations for linking dependencies of that module (meaning that the test will not be calling privileged stuff such as CPL0 instructions, while in CPL3). I also link it with the test framework's entrypoint to produce a new executable binary that can then be run:
foo.o + test_mocks.c + test_entrypoint.c → foo_test // configured by CMake
Important: foo.o
is reused as-is from the kernel module build step, it is the same object code that would go into the production.
However, the test application foo_test
crashes early in the prologue of the first function invoked from foo.o. Debugging shows that it segfaults on instruction mov %gs:0x28,%rax
. GS is not set (contains 0), so naturally there is no mapping for the destination.
Another instruction that gets inserted into the test binary closer to function's epilogue is sub %gs:0x28,%rdx
. Both of these reminded me about thread-local storage, except that it should be using the FS segment instead of GS.
foo.o
comes from the module build process, so it must be some kernel-specific backend option passed to the compiler that causes this.
By going through compiler options used by the kbuild process I have discovered that it is -mcmodel=kernel
that causes GS to be used instead of FS.
GCC's documentation on this flag only mentions effects on address range but not on the segment's choice:
-mcmodel=kernel Generate code for the kernel code model. The kernel runs in the negative 2 GB of the address space. This model has to be used for Linux kernel code.
I have verified that removing this option when compiling a single .c
file into the .o
file indeed causes all %gs
instances to be replaced by %fs
in the object file.
How to allow functions from the module file to be called without crashing?
Some options that I have or am considering.
-nostdinc
, a bunch of -isystem
, redefining true
and false
etc.). Besides, it would miss the point of testing the binary code that is actually used in the production.mov %gs
instructions won't crash? I am not planning to use thread-local storage in the tests, so I assume FS stays unmodified during the lifetime of the application. Can it be done entirely in userspace?The mov
and sub %gs:0x28,%rdx
are from -fstack-protector-strong
which is on by default, so the simplest and easiest thing to do would be to compile with -fno-stack-protector-strong
.
You could always have that file compiled with stack-protector disabled. But if you're modifying build options for unit-testing, -mcmodel=small-pie
while keeping all of the other build options like -nostdinc
would be the other way to solve your problem.
Or compile with -S
and use sed
or something to replace %gs
with %fs
.
For using the existing .ko
, high 2G (mcmodel=kernel) code should be able to run in the low 2G (normal user-space with -fno-pie -no-pie
). It might use sign-extended 32-bit absolute addresses (e.g. of arrays, for instructions like mov array(%rdi,,4), %eax
); that will work correctly in the low or high 2G. It won't use mov $string_literal, %esi
since that zero-extends to 64-bit. Code build for user-space -mcmodel=small -fno-pie
won't work in the kernel because it will use that.
For the GS vs. FS issue:
Most code won't modify the FS (or GS) base after thread startup. Or after program startup if your code is single-threaded. In that case it should be _start
or libc init code that actually sets the FS base.
You don't want to set %gs
's or %fs
's value; the segment register value is just a 16-bit selector, usually the null selector (0
) in 64-bit code. The 64-bit segment base address (fs.base) is set via an MSR in a system call, or wrfsbase
. There won't even be a GDT or LDT entry with the segment base you want, so no value for mov %eax, %gs
could work. (GDT/LDT entries only have room for a 32-bit segment base; AMD64 chose to use an alternate mechanism instead of extending the format, and 64-bit mode treats CS/DS/ES/SS bases as 0.)
I think your kernel code will probably expect the same layout of things in memory at the GS base as user-space would expect at FS, so just setting GS.base = FS.base before any .ko
code runs should do the trick (top of main, or in a wrapper function for a new thread).
If you're on a new enough kernel and CPU for FSGSBASE instructions to work in user-space (Ivy Bridge, and kernel from the past few years):
asm("rdfsbase %rax ; wrgsbase %rax" ::: "memory", "rax");
Otherwise, the traditional way is to call arch_prctl
with ARCH_GET_FS
and ARCH_SET_GS
. (https://man7.org/linux/man-pages/man2/arch_prctl.2.html).
See also How to access segment register without linking libc.so? for some hand-written asm that calls arch_prctl
with ARCH_SET_FS
to set up FS appropriately for normal user-space code that wants to use %fs:0x28
, but if you're making system calls you should just use C.
If your main
is part of the .ko
, you'll need to avoid having GCC generate stack-cookie code in main's prologue before you could get GS set, unless you do something tricky like write an asm main
that does it and tailcalls your real_main
C function.
If so, make sure main
doesn't contain any variable-sized objects, or any arrays bigger than 16 bytes, or whatever the threshold is for stack-protector-strong
. And make sure any functions under test don't inline into it if they have local arrays. (Hmm, but you wouldn't want to mark them __attribute__((noinline))
in general, only into main. Maybe __attribute__((optimize("O0")))
on main could work.
But hopefully your unit-test main
is compiled separately so it's not bloating the kernel binary.