I was just beginning out with C (K.N king's C Programming) when I came across the following passage:
By default, floating constants are stored as double-precision numbers. In other words, when a C compiler finds the constant
57.0
in a program, it arranges for the number to be stored in memory in the same format as adouble
variable. This rule generally causes no problems, sincedouble
values are converted automatically tofloat
when necessary.
Suppose I have the following statements:
float x = 5.0; // 1
float y = 5.0f; // 2
What does the passage mean in this example? What is the difference between statement 1 and 2 with regard to storage of values in bits?
In the first statement, is 5.0
first saved as a double
and then allocated as a float
to x
?
The author’s assertion that “By default, floating constants are stored as double-precision numbers” likely arises from this paragraph in the C standard, C 2018 6.4.4.2 4:
An unsuffixed floating constant has type
double
. If suffixed by the letterf
orF
, it has typefloat
. If suffixed by the letterl
orL
, it has typelong double
.
That paragraph makes it clear that a floating-point constant in source code is by default (meaning it does not have a suffix) interpreted as a double
. But the author’s assertion that the value is “stored” is imprecise. The C standard tells us how to interpret source code, but it does not require that constants be stored. Even in the abstract machine model the C standard uses to specify C semantics, before optimization is considered, it is only specified that the values of variables are stored in the memory of the variables, not that the values of constants are stored.
Thus, I would expect the compiler to do its best job of converting an unsuffixed constant to double
,1 but I would not necessarily expect it to store it anywhere other than in its own memory while working with it and generating the program. It might end up storing it in the program’s data if needed, but it could generate it in instructions or fold it into other parts of an expression.
This rule generally causes no problems, since double values are converted automatically to float when necessary.
I would phrase it as saying problems caused by this automatic conversion are rare. Stating that it “generally” causes no problems might cause a student to take that as a general rule, rather than being cautious of when problems can occur. In situations where floating-point constants are carefully engineered for a particular task, suffixes should be used to ensure the constant has exactly the desired value.
In your example with five, float x = 5.0;
and float y = 5.0f;
will produce the same value in x
because five is representable in both float
and double
. However, consider this code:
#include <stdio.h>
int main(void)
{
float x = 0x9.876548000000000000001p0;
float y = 0x9.876548000000000000001p0f;
printf("%a\n", x);
printf("%a\n", y);
}
In my C implementation, x
and y
get different values, and this prints:
0x1.30eca8p+3 0x1.30ecaap+3
The reason is this:
In float x = 0x9.876548000000000000001p0;
, 9.87654800000000000000116 is converted to double
. The final 1 bit is several bits below what is representable in a double, so it is rounded down, producing 9.87654816. Then this double
is converted to float
for storing in x
. The low bit of the 4 is the last bit that fits in a float
, so the first bit of the 8 is the first bit that does not fit. This is halfway between two values representable in a float
, 9.8765416 and 9.8765516. In case of a tie, the rule is to round to the even low bit, so the result of the conversion is 9.8765416, and that is stored in x
. Printing it produces 0x1.30eca8p+3
, which is another representation of that number.
In float y = 0x9.876548000000000000001p0f;
, 9.87654800000000000000116 is converted to float
. Again, the low bit of the 4 fits, so the part that does not fit is the 8000000000000001. Because of the 1, this is more than halfway from 9.8765416 to 9.8765516, so there is no tie, and rounding to the nearest value produces 9.8765516, and that is what is stored in y
. Printing it produces 0x1.30ecaap+3
, another representation of the same value.
1 The C standard is lax about how floating-point constants are converted to floating-point values. C 2018 6.4.4.2 7 says translation-time conversion “should” match the conversion done by library functions such as strtod
, and 7.22.1.3 9 says strtod
“should” be correctly rounded if it does not have too many digits (at most DECIMAL_DIG
digits) or, if it does, should equal the result of converting one of the two decimal numbers with DECIMAL_DIG
digits that immediately bound the value. This is a legacy due to the fact that converting values with exponents such as +300
or -300
from scratch nominally requires computing with hundreds of digits, considered too much of a burden for early compilers and computers. Modern algorithms have been devised for this, so the standard could require correct rounding in all cases.