I was reading about the floating point implementation from the comments of a ziglings.org exercise, and I came across this info about it.
// Floating further:
//
// As an example, Zig's f16 is a IEEE 754 "half-precision" binary
// floating-point format ("binary16"), which is stored in memory
// like so:
//
// 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0
// | |-------| |-----------------|
// | exponent significand
// |
// sign
//
// This example is the decimal number 3.140625, which happens to
// be the closest representation of Pi we can make with an f16
// due to the way IEEE-754 floating points store digits:
//
// * Sign bit 0 makes the number positive.
// * Exponent bits 10000 are a scale of 16.
// * Significand bits 1001001000 are the decimal value 584.
//
// IEEE-754 saves space by modifying these values: the value
// 01111 is always subtracted from the exponent bits (in our
// case, 10000 - 01111 = 1, so our exponent is 2^1) and our
// significand digits become the decimal value _after_ an
// implicit 1 (so 1.1001001000 or 1.5703125 in decimal)! This
// gives us:
//
// 2^1 * 1.5703125 = 3.140625
There is something I am having trouble understanding here. Why is 1.1001001000 not 1.584? Why does it become 1.5703125 when 1001001000 is 584? Is there something the . does here?
Basically, why does the implicit one change the value of the fractional part?
Yes, binary 1001001000
is 584. But, remember, binary is base two, so when you move the .
around, you're multiplying or dividing by 2, not 10.
So if you move the .
one place to the left you get 100100100.0
which is 584 ÷ 2 = 292. If you move it two places that's 10010010.00
which is 584 ÷ 4 = 146, or five places is 10010.01000
which is 584 ÷ 32 = 18.25. And 0.1001001000
is 584 ÷ 1024 = 0.5703125, so 1.1001001000
is indeed 1.5703125.
The other way to do it is by straight application of the definition of a positional number system:
1.10010010002 = 1 × 20 + 1 × 2-1 + 1 × 2-4 + 1 × 2-7 = 1 + 0.5 + 0.0625 + 0.0078125 = 1.5703125 .
Everybody who's worked with computers for a while learns the basic correspondence between binary and decimal integers:
001 1
010 2
011 3
100 4
101 5
110 6
111 7
But when you're working with fractions, to the right of the decimal point, it's all different:
.001 .125 ⅛
.010 .25 ¼
.011 .375 ⅜
.100 .5 ½
.101 .625 ⅝
.110 .75 ¾
.111 .875 ⅞
There's no magic here: in all 7 cases I've moved the binary point three bits to the left, which is the same as dividing by 23 = 8, so we end up with 1/8, 2/8 = 1/4, 3/8, etc.
So you'll notice that the binary fractions are all halves and quarters and eighths. (If we went past 3 bits, we'd start seeing sixteenths and thirty-seconds and sixty-fourths, etc.) This leads to one of the quirks of binary floating point, namely that it's impossible to perfectly represent the ordinary-looking decimal fraction 0.1, or 1/10.