In the x86 intel reference manual it says:
"The overflow flag is set only if the single-shift forms of the instruction are used. [...]"
But when I have the following scenario:
xor eax, eax
mov al, 0b11000000
shl al, 2
;content of al: 00000000
Here the high bit of the answer is not the same as the result of the carry-out, namely cf = 1
, and the overflow flag is not set.
I don't get why this is the correct behavior. Why is the overflow flag set only when single shifts are used?
OF=undefined for shift counts other than 1; results in practice depend on your CPU. See below for my theory of how it's set on my Intel CPU.
This design decision makes some sense, letting the hardware be slightly simpler.
Detecting 2's complement signed overflow properly would require checking that all bits shifted out matched the new MSB. That's different from just checking the final bit shifted out like it does now with CF, so would require some internal state for a one-at-a-time shifter like original 8086 used.
That's perhaps what Stephen Morse
(architect of the 8086 ISA) was thinking when he made the design choice for 8086. His book, the 8086 Primer, is available for free on his web site, and confirms (pg96) that 8086 has undefined OF for the variable-count opcode. (For 8086, apparently that includes shl al, cl
with CL=1, unlike how Intel currently documents.) The section about shift instructions and what they're for (pg64-66) doesn't mention OF, only CF.
Having to check all the bits shifted out might also make a barrel shifter more expensive, but Morse was less likely to be thinking of that.
IDK why Morse didn't define OF as always being set in some specific way, perhaps according to CF not matching the current MSB, which is probably not useful but still would be meaningful for counts of 1. The ALU is already required to get the last bit shifted out for CF. Perhaps that's because 8086 didn't define anything for OF in the variable-count opcode, even if the count happened to be 1.
Note that some CPUs in practice produce OF=1 for some cases with a count greater than 1. For example, my i7-6700k Skylake does with 0x7f << 2
.
The documentation says
OF flag is affected only for 1-bit shifts (see “Description” above); otherwise, it is undefined.
Undefined is not the opposite of affected; that would be "unaffected". It's always set to some value, they just don't document how the CPU picks 0 vs. 1.
Actually unmodified would force reading and merging with the old FLAGS value for immediate shift counts other than 0
on modern CPUs, like for variable counts in case it's 0, so it's good that it's not specified that way. (shl reg, cl
is 3 uops on Sandybridge-family because of the need to leave FLAGS unmodified in case CL&31 == 0). So that would be an unwanted data dependency, unlike now where shifts write all flags unless the count is 0.
I tested my CPU with this NASM program
_start:
mov cl, 7
mov dl, 0x7f ; GDB set $dl = 0xc0 or whatever after this
.loop:
mov eax, edx
shl al, cl
dec cl ; set a breakpoint here to look at EFLAGS after every continue
jnz .loop
;; fall off the end; I'm only single-stepping this in GDB anyway
assemble + link into a static executable with nasm+ld, run with GDB and use layout reg
/ layout next
. Use starti
and si
.
My Skylake CPU does set OF=1 for shl al,cl
with AL=0x7f CL=2 (or 1 or any non-zero count). Or for AL=0x80. But never sets it for AL=0x3 for any count, or for AL=0xc0 (0b1100_0000)
My current guess to explain the behaviour is that OF is set as if it was a shift by 1,
i.e. if OF = (input[MSB] != input[MSB-1])
of the input bits.
This makes sense; it gives the correct result in the case where the paper spec requires a specific result, and it's cheap to implement. (The OF output would still have to come from different bits depending on the operand-size.)
Of course, other microarchitectures from other vendors can be different. As could pure-software x86 emulators which still comply with the on-paper spec.