x86sseinstruction-set

How do I enable SSE for my freestanding bootable code?


(This question was originally about the CVTSI2SD instruction and the fact that I thought it didn't work on the Pentium M CPU, but in fact it's because I'm using a custom OS and I need to manually enable SSE.)

I have a Pentium M CPU and a custom OS which so far used no SSE instructions, but I now need to use them.

Trying to execute any SSE instruction results in an interruption 6, illegal opcode (which in Linux would cause a SIGILL, but this isn't Linux), also referred to in the Intel architectures software developer's manual (which I refer from now on as IASDM) as #UD - Invalid Opcode (UnDefined Opcode).

Edit: Peter Cordes actually identified the right cause, and pointed me to the solution, which I resume below:

If you're running an ancient OS that doesn't support saving XMM regs on context switches, the SSE-enabling bit in one of the machine control registers won't be set.

Indeed, the IASDM mentions this:

If an operating system did not provide adequate system level support for SSE, executing an SSE or SSE2 instructions can also generate #UD.

Peter Cordes pointed me to the SSE OSDev wiki, which describes how to enable SSE by writing to both CR0 and CR4 control registers:

clear the CR0.EM bit (bit 2) [ CR0 &= ~(1 << 2) ]
set the CR0.MP bit (bit 1) [ CR0 |= (1 << 1) ]
set the CR4.OSFXSR bit (bit 9) [ CR4 |= (1 << 9) ]
set the CR4.OSXMMEXCPT bit (bit 10) [ CR4 |= (1 << 10) ]

Note that, in order to be able to write to these registers, if you are in protected mode, then you need to be in privilege level 0. The answer to this question explains how to test it: if in protected mode, that is, when bit 0 (PE) in CR0 is set to 1, then you can test bits 0 and 1 from the CS selector, which should be both 0.

Finally, the custom OS must properly handle XMM registers during context switches, by saving and restoring them when necessary.


Solution

  • If you're running an ancient or custom OS that doesn't support saving XMM regs on context switches, it won't have set the SSE-enabling bits in the machine control registers. In that case all instructions that touch xmm regs will fault.

    http://wiki.osdev.org/SSE explains how to alter CR0 and CR4 to allow SSE instructions to run on bare metal without #UD.

    Note that VEX prefixes won't decode in real-mode, so you can't enable AVX there even if your CPU supports it. You have to be in protected or long mode if you want AVX on CPUs that support it.


    My first thought on your old version of the question was that you might have compiled your program with -mavx, -march=sandybridge or equivalent, causing the compiler to emit the VEX-encoded version of everything.

    CVTSI2SD   xmm1, xmm2/m32         ; SSE2
    VCVTSI2SD  xmm1, xmm2, xmm3/m32   ; AVX
    

    See https://stackoverflow.com/tags/x86/info for links, including to Intel's insn set ref manual.

    Most real-world kernels are built with options that stop the compiler from using SSE or x87 instructions on its own, for example gcc -mgeneral-regs-only. Or in older GCC, -mno-sse -mno-mmx and avoid any use of float or double types to avoid x87. This is so kernels only have to save/restore integer registers on interrupts and system calls, only doing the SIMD/FP state on a full context switch to a different user-space task. Before that option existed and was used, Linux kernel code that used double could silently corrupt user-space state!

    If you have a freestanding program that isn't trying to context-switch between user-space tasks, go ahead and let the compiler use SSE / AVX.


    Related: Which versions of Windows support/require which CPU multimedia extensions? (How to check if SSE or AVX are fully usable?) has some details about how to check for support for AVX and AVX512 (which also introduce new architectural state, so the OS has to set a bit or the HW will fault). It's coming at it from the other angle, but the links should indicate how to activate / disable AVX support.