I'm interested in how the compiler does to cast a float
into an int
by instructions like :
float x_f = 3.1415
int x = (int)x_f;
Especially talking about speed. Is it super-fast like build-in processor instruction? Or does it need computing?
I also wander if it changes something if the float
always contains an exact integer (ex: x_f = 3.0000
).
EDIT: This question is for gcc compilers used on intel x86 processors.
EDIT2: Does it change something if x_f = 3.0
?
It depends a lot on the particular cpu. Since you're interested in x86, the original 387 fpu has an instruction to convert float to integer, but it can't be used directly because it uses the default rounding mode, whereas conversions in C are required to truncate, not round. Thus, the following function:
int f(float x)
{
return x;
}
compiles to (with gcc -O3 -fno-asynchronous-unwind-tables
, to avoid crud in the asm):
.text
.p2align 4,,15
.globl f
.type f, @function
f:
subl $8, %esp
fnstcw 6(%esp)
movw 6(%esp), %ax
movb $12, %ah
movw %ax, 4(%esp)
flds 12(%esp)
fldcw 4(%esp)
fistpl (%esp)
fldcw 6(%esp)
movl (%esp), %eax
addl $8, %esp
ret
What it's doing it saving, changing, and restoring the fpu control word to change the rounding mode.
On the other hand, if you're building for a target that has SSE available for floating point, you get:
.text
.globl f
.type f, @function
f:
cvttss2si 4(%esp), %eax
ret
So, it really depends.
Finally, since you mentioned you're particularly interested in the case where the value is already a whole number, this does not make any difference. The cpu operations to convert almost surely don't care. However, in this case you can cheat: since you know the input is a whole number, rounding and truncation produce the same result, and you can use lrintf
rather than casting or implicitly converting to float. This should be a major improvement on x86 targets not using sse for math, especially if the compiler recognizes lrintf
and inlines it. Here is the same function, using lrintf(x)
instead of x
, with the -fno-math-errno
option added (otherwise gcc assumes libm might want to set errno
and thus doesn't replace the call):
f:
pushl %eax
flds 8(%esp)
fistpl (%esp)
movl (%esp), %eax
popl %edx
ret
Note that gcc did a bad job of compiling this function; it could have generated:
f:
flds 4(%esp)
fistpl 4(%esp)
movl 4(%esp), %eax
ret
This is valid because the argument space on the stack belongs to the callee and may be clobbered at will. And even if it weren't, movl (%esp),%eax ; popl %edx
when you don't care what ends up in edx
is an idiotic way of writing popl %eax
...