Look at this snippet:
int main() {
double v = 1.1;
return v == 1.1;
}
On 32-bit compilations, this program returns 0, if -fexcess-precision=standard
is specified. Without it, the program returns 1.
Why is there a difference? Looking at the assembly code (godbolt), it seems that with -fexcess-precision=standard
, gcc uses 1.1
as a long double
constant (it loads the constant as TBYTE
). Why does it do so?
First I thought it was bug, but I found this gcc bug comment, it seems that this behavior is intentional, or at least it is not unexpected.
Is this a QoI issue? I understand that the comparison is executed using long double
precision, but still, my 1.1
is not a long double
literal. The weird thing is that if I cast the 1.1
at the comparison to double
(which is already a double
), the issue goes away.
(Another weird thing is that GCC does the load and compare twice, see the double fucomip
instructions. But it does this even in 64-bit mode. I understand that in my godbolt link, optimization is turned off, but still, there is only one comparison in my code, why does GCC compare twice?)
Here's the asm code, without -fexcess-precision=standard
:
main:
push ebp
mov ebp, esp
and esp, -8
sub esp, 16
fld QWORD PTR .LC0
fstp QWORD PTR [esp+8]
fld QWORD PTR [esp+8]
fld QWORD PTR .LC0
fucomip st, st(1)
fstp st(0)
setnp al
mov edx, 0
fld QWORD PTR [esp+8]
fld QWORD PTR .LC0
fucomip st, st(1)
fstp st(0)
cmovne eax, edx
movzx eax, al
leave
ret
.LC0:
.long -1717986918
.long 1072798105
And here is with it:
main:
push ebp
mov ebp, esp
and esp, -8
sub esp, 16
fld QWORD PTR .LC0
fstp QWORD PTR [esp+8]
fld QWORD PTR [esp+8]
fld TBYTE PTR .LC1
fucomip st, st(1)
setnp al
mov edx, 0
fld TBYTE PTR .LC1
fucomip st, st(1)
fstp st(0)
cmovne eax, edx
movzx eax, al
leave
ret
.LC0:
.long -1717986918
.long 1072798105
.LC1:
.long -858993459
.long -1932735284
.long 16383
In C, it is permitted (as indicated via FLT_EVAL_METHOD
) that a floating point literal may hold a value with more prevision as permitted by its type and that at the same time floating point operators are evaluated in a higher precision than the operand types permit as well.
In that case v == 1.1
can be false because the literal 1.1
, although of type double
, will not be rounded to double
precision, but ==
still compares it in higher precision against the stored value of v
which still must be rounded to a value representable by double
.
In C++, although it is still permitted for floating point operations to be evaluated in higher precision, the value of a floating point literal still needs to be rounded to a value representable in its type.
However, this interacts incorrectly with specification incorporated from C, such as FLT_EVAL_METHOD
, and deviates from C for seemingly no reason, so the question of precision of floating point literal values is still an open issue, see https://cplusplus.github.io/CWG/issues/2752.html and https://github.com/cplusplus/papers/issues/1584.
Without the -fexcess-precision=standard
flag GCC doesn't behave standard-conforming at all and may even interpret the value of v
as higher precision than its type permits, which is not permitted by either C or C++ standard. (Assignment, casting and initialization should always force a rounding to a value representable value in the actual type.) With that it can happen that v == 1.1
is true again by virtue of both the literal and the value of v
as retrieved from the literal never being rounded to a representable double
value.
All of this is typically relevant on e.g. a 32bit x86 compilation, where FLT_EVAL_METHOD
will often be defined as 2
, meaning that the higher precision mentioned above should always be chosen as if the type was long double
. This is to support keeping double
as 64bit type while performing floating point operations in 80bit precision on the x87 FPU. Normally this choice for FLT_EVAL_METHOD
makes the behavior deterministic in the sense that it is possible to tell exactly where a rounding is applied, but note that GCC's default (-fexcess-precision=fast
) will not be consistent in whether and where rounding will be applied at all.
Given that FLT_EVAL_METHOD
is 2
and given the choices for floating point types, following the C rules, v == 1.1
evaluating to false is the only correct standard-conforming behavior. For C++ that is different, but it isn't clear whether that is a defect in the C++ standard. It is therefore somewhat understandable why GCC would follow the behavior required in C.
The fact that v == 1.1
can evaluate to false is very intentional and programmers need to be aware of the excess precision behavior, unless they make sure that their code only needs to support implementations with FLT_EVAL_METHOD == 0
where no excess precision will be applied.