floating-pointieee-754

Floating Point: Why does the implicit 1 change the value of the fractional part?


I was reading about the floating point implementation from the comments of a ziglings.org exercise, and I came across this info about it.

// Floating further:
//
// As an example, Zig's f16 is a IEEE 754 "half-precision" binary
// floating-point format ("binary16"), which is stored in memory
// like so:
//
//         0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0
//         | |-------| |-----------------|
//         |  exponent     significand
//         |
//          sign
//
// This example is the decimal number 3.140625, which happens to
// be the closest representation of Pi we can make with an f16
// due to the way IEEE-754 floating points store digits:
//
//   * Sign bit 0 makes the number positive.
//   * Exponent bits 10000 are a scale of 16.
//   * Significand bits 1001001000 are the decimal value 584.
//
// IEEE-754 saves space by modifying these values: the value
// 01111 is always subtracted from the exponent bits (in our
// case, 10000 - 01111 = 1, so our exponent is 2^1) and our
// significand digits become the decimal value _after_ an
// implicit 1 (so 1.1001001000 or 1.5703125 in decimal)! This
// gives us:
//
//     2^1 * 1.5703125 = 3.140625

There is something I am having trouble understanding here. Why is 1.1001001000 not 1.584? Why does it become 1.5703125 when 1001001000 is 584? Is there something the . does here?

Basically, why does the implicit one change the value of the fractional part?


Solution

  • Yes, binary 1001001000 is 584. But, remember, binary is base two, so when you move the . around, you're multiplying or dividing by 2, not 10.

    So if you move the . one place to the left you get 100100100.0 which is 584 ÷ 2 = 292. If you move it two places that's 10010010.00 which is 584 ÷ 4 = 146, or five places is 10010.01000 which is 584 ÷ 32 = 18.25. And 0.1001001000 is 584 ÷ 1024 = 0.5703125, so 1.1001001000 is indeed 1.5703125.

    The other way to do it is by straight application of the definition of a positional number system:

    1.10010010002 = 1 × 20 + 1 × 2-1 + 1 × 2-4 + 1 × 2-7 = 1 + 0.5 + 0.0625 + 0.0078125 = 1.5703125 .

    Everybody who's worked with computers for a while learns the basic correspondence between binary and decimal integers:

    001    1
    010    2
    011    3
    100    4
    101    5
    110    6
    111    7
    

    But when you're working with fractions, to the right of the decimal point, it's all different:

    .001    .125    ⅛
    .010    .25     ¼
    .011    .375    ⅜
    .100    .5      ½
    .101    .625    ⅝
    .110    .75     ¾
    .111    .875    ⅞
    

    There's no magic here: in all 7 cases I've moved the binary point three bits to the left, which is the same as dividing by 23 = 8, so we end up with 1/8, 2/8 = 1/4, 3/8, etc.

    So you'll notice that the binary fractions are all halves and quarters and eighths. (If we went past 3 bits, we'd start seeing sixteenths and thirty-seconds and sixty-fourths, etc.) This leads to one of the quirks of binary floating point, namely that it's impossible to perfectly represent the ordinary-looking decimal fraction 0.1, or 1/10.