linux-kernelcryptographycompiler-optimizationsimdfpu

Generate and optimize FP / SIMD code in the Linux Kernel on files which contains kernel_fpu_begin()?


I know that it's forbidden to use any kind of floating-point code in the kernel, and we never should use any GCC flag that could generate FP / SIMD instructions, but what about some source code (especially arch/x86/crypto/*) that uses kernel_fpu_begin() and kernel_fpu_end()?

Example 1, example 2.

I have an ancient Intel Core 2 Duo CPU that I use for my 64-bit Linux Kernel and in the main Makefile I use the following C flags:

# Target specific Flags
KBUILD_CFLAGS   += \
           -m64 \
           -march=core2 \
           -mtune=core2 \
           -mfpmath=sse \
           -msoft-float \
           -mno-fp-ret-in-387 \
           -mno-mmx \
           -mno-sse \
           -mno-sse2 \
           -mno-sse3 \
           -mno-ssse3

# FPU Flags
FPU_CFLAGS := $(KBUILD_CFLAGS) \
           -mhard-float \
           -mfp-ret-in-387 \
           -mmmx \
           -msse \
           -msse2 \
           -msse3 \
           -mssse3 \
           -ftree-vectorize

and in files where kernel_fpu_begin() is present, I pass the FPU_CFLAGS in their Makefiles like this:

CFLAGS_sha512_ssse3_glue.o := $(FPU_CFLAGS)

Is this correct and will it optimize the FP / SIMD code? Or is it not needed and this implementation could even break the state of the FPU / SIMD?


Solution

  • Is this correct

    No, absolutely do not do this. These options tell GCC it can use SIMD/FP instructions anywhere in this compilation unit, including before kernel_fpu_begin() or after kernel_fpu_end(), or in functions that never call kernel_fpu_begin().

    e.g. it could emit a movdqu load or store to copy 16 bytes of a struct and corrupt user-space XMM register state before kernel_fpu_begin saved it.

    and will optimize the FP / SIMD Code?

    No, kernel code that uses kernel_fpu_begin() also uses inline asm to run SIMD instructions. That will emit SIMD instructions without any help from the compiler.

    Or in theory some kernel code could use a function attribute like __attribute__((target("sse2"))) or something like that for a helper function called from inside a kernel_fpu_begin() / end block. But I think Linux prefers inline asm instead of that plus intrinsics or auto-vectorization.

    The kernel wouldn't bother to include kernel_fpu_begin()/end calls if it was getting zero benefit from it. BTW, you can disassemble the relevant .ko kernel modules and see that they do in fact contain SIMD instructions that use XMM registers. Use objdump -drwC -Mintel foo.ko