[SOLVED] IEEE Floating Point Numbers: Are they not "precise"? (read: well-defined)

IEEE Floating Point Numbers: Are they not "precise"? (read: well-defined)

For a long time I thought floating point arithmetic is well-defined and different platforms making the same calculation should get the same results. (Given the same rounding modes.)

Yet Microsoft's SQL Server deems any calculation performed with floating point to have an "imprecise" result, meaning you can't have an index on it. This suggests to me that the guys at Microsoft thought there's a relevant catch regarding floating point.

So what is the catch?

EDIT: You can have indexes on floats in general, just not on computed columns. Also, geodata uses floating point types for the coordinates and those are obviously indexed, too. So it's not a problem of reliable comparison in general.

Solution

Floating-point arithmetic is well defined by the IEEE 754 standard. In the documentation you point out, Microsoft has apparently not chosen to adhere to the standard.

There are a variety of issues that makes floating-point reproducibility difficult, and you can find Stack Overflow discussions about them by searching for “[floating-point] reproducibility”. However, most of these issues are about lack of control in high-level languages (the individual floating-point operations are completely reproducible and specified by IEEE 754, and the hardware provides sufficient IEEE 754 conformance, but the high-level language specification does not adequately map language constructs to specific floating-point operations), differences in math library routines (functions such as sin and log are “hard” to compute in some sense, and vendors implement them without what is called correct rounding, so each vendor’s routines have slightly different error characteristics than others), multithreading and other issues allow operations to be performed in different orders, thus yielding different results, and so on.

In a single system such as Microsoft’s SQL Server, Microsoft presumably could have controlled these issues if they wanted to. Still, there are issues to consider. For example, a database system may have a sum function that computes the sum of many things. For speed, you may wish the sum implementation to have the flexibility to add the elements in any order, so that it can take advantage of multiprocessing or of adding the elements in whatever order they happen to be stored in. But adding floating-point data using elementary add operations of the same floating-point format has different results depending on the order of the elements. To make the sum reproducible, you have to specify the order of operation or use extra precision or other techniques, and then performance suffers.

So, not making floating-point arithmetic reproducible is a choice that is made, not a consequence of any lack of specification for floating-point arithmetic.

Another problem for database purposes is that even well defined and completely specified floating-point arithmetic has NaN values. (NaN, an abbreviation for Not a Number, represents a floating-point datum that is not a number. A NaN is produced as the result of an operation that has no mathematical result (such as the real square root of a negative number). NaNs act as placeholders so that floating-point operations can continue without interruption, and an application can complete a set of floating-point operations and then take action to replace or otherwise deal with any NaNs that arose.) Because a NaN does not represent a number, it is not equal to anything, not even itself. Comparing two NaNs for equality produces false, even if the NaNs are represented with exactly the same bits. This is a problem for databases, because NaNs cannot be used as a key for looking up records because a NaN search key will never equal a NaN in the key field of a record. Sometimes this is deal with by defining two different ordering relations—one is the usual mathematical comparison, which defines less than, equal to, and greater than for numbers (and for which all three are false for NaNs), and a second which defines a sort order and which is defined for all data, including NaNs.

It should be noted that each floating-point datum that is not a NaN represents a certain number exactly. There is no imprecision in a floating-point number. A floating-point number does not represent an interval. Floating-point operations approximate real arithmetic in that they return values approximately equal to the exact mathematical results, while floating-point numbers are exact. Elementary floating-point operations are exactly specified by IEEE 754. Lack of reproducibility arises in using different operations (including the same operations with different precisions), in using operations in different orders, or in using operations that do not conform to the IEEE 754 standard.