If you were reading news about developments in graphics in the 1990s, you might have followed Jim Blinn's column in IEEE Computer Graphics & Applications, "Jim Blinn's corner." In the summer of 1997 you would have opened your issue to a column called "Floating-Point Tricks". All of the tricks are based on the following chestnut of knowledge Blinn passes along from Steve Gabriel and Gideon Yuval, two old hands at microsoft:
"If you only deal with positive numbers, the bit pattern of a floating point number, interpreted as an integer, gives a piecewise linear approximation to the logarithm function"
My question is, what is it about floating point numbers which gives rise to this characteristic?
// These will result in undefined behavior
// and are not the 'right' way to do this
// but I am following Blinn 1997 here
int AsInteger(float f) {
return * ( int * ) &f;
}
float AsFloat(int i) {
return * ( float * ) &i;
}
// FISR demonstrating the logaritmic nature of the floating-point numbers
// The constant chosen is not "magic" but that which replaces the lost
// exponents from the shift of the number as an integer.
float BlinnISR(float x) {
int i;
float y;
// 0x5F400000 = 1598029824
// which is (AsInteger(1.0f) + (AsInteger(1.0f) >> 1))
i = 0x5F400000 - ( AsInteger(x) >> 1);
y = AsFloat(i);
// Blinn uses 1.47 and 0.47 not 1.5 and 0.5
// which are what you would get from differentiating
y = y * (1.47f - 0.47f * x * y * y);
return y;
}
The above, a recreation of Blinn's example code, shows how it works practically for the inverse square root.
Remember that 1 / sqrt(x) is the same as x(-1/2). So if we have access to a table of logarithms one way to find the square root is to take the natural log of x(-1/2) which gives you -1/2 * ln(x). If you exponentiate this you get 1 / sqrt(x) = e((-1/2) * ln(x)).
Right shifting the integer representation of the float bits gives us an approximation of a division by two of the logarithm of the input floating point number (this is our 1/2 * ln(x)). This is subtracted from the restoring constant, here just 0x5F400000
, not yet 'magic' so we have essentially -1/2 * ln(x). Returning that to a float serves as our exponentiation step. We then have a number which serves as a good first input into Newton's method.
Visually, you can see the relationship here (Blinn 1997, p. 81):
A similar, idealized graph appears in Mitchell 1961 (p. 513, linked below):
This same relationship is shown in Jean-Michel Muller, ”Elementary Functions and Approximate Computing,” in Proceedings of the IEEE, vol. 108, no. 12, p. 2137, Dec. 2020
This characteristic is not unique to IEEE 754 floating point numbers, though that is where it is most famously exploited in Quake III Arena's fast inverse square root which uses a standard float
. However it will work in an abstract format like Knuth's MIX
didactic format, or with others, including the below examples.
To the best of my understanding, the relationship was first recorded in the academic press with John Mitchell's 1961 Computer Multiplication and Division Using Binary Logarithms demonstrating the method with an abstract binary logarithm system. William Kahan implemented a similar scheme around 1961 or 1962 on an IBM 7090. We know as well, in part due to Noah Hellman's masters thesis, Mitchell-Based Approximate Operations on Floating-Point Numbers, that it was also implemented in UNIX as the system square root from 1974 until the creation of a libm for BSD 4.3 in the mid 1980s. Around that same time, Kahan and his graduate student K-C Ng produced but did not publish a 'magic' square root using this method designed to operate on not floats
but doubles
which would be halved and operated on as 32 bit parts (see a related stackoverflow question).
It will not, however, work for all floating point formats. DEC's VAX floating point format interleaves the sign, exponent and fraction portion (which it splits into two) so it won't do.
I don't care (for this question) about how engineers came to learn about this relationship, or where it has been exploited (cf the FISR). I want to know why it exists. Historical questions I've asked which sort of relate to the topic would be found on the retrocomputing stack exchange: early examples of indexing float bits in C, type punning and C standards/compilers
I'm also not asking how to do it, though a good answer might include an illustrative example with some code. However, you should first take a look at this amazing FISR question on codegolf to see some really interesting ones.
A bad answer might be "any working engineer with a slide rule would understand this relationship," though I could imagine that a great answer might resort to a slide rule. Another bad answer might be "because Knuth says so," while a very good answer might include that sentiment amidst an explanation which depends on section 4.2.4 of Volume 2 of TAOCP (I don't have page numbers, I have O'Reilly's pagination abomination, it is pp. 240-247 in the print first edition).
An excellent answer should provide a clear understanding of the nature of the relationship and why it arises. A proof that the relationship must be so would be intruiging, though I can forsee some challenges.
I am writing a history of the Fast Inverse Square Root, information about which is collected at 0x5f37642f.com. Part of investigating this as a historian means asking questions like this from the community. I do not have the expertise to distill, by myself, much of this because I am not a mathematician or a computer scientist.
The FISR has basically three interesting parts: the logarimic approximation, the "magic" constant", and newton's method. Between the three when folks write about the FISR they write about the magic constant, then newton's method, then spend very little time talking about the approximation. I get the feeling from the broader 'literature' that the approximation is widely known but not as clearly well understood. What I learn from answers will help me understand it.
An excellent answer will be noted in the acknowledgements of papers or talks which come out of this work.
I'm also asking because c'mon, no one has asked this yet and we've been here HOW long?
I'm going to limit my answer to the question, "Why does the integer representation of a floating point number offer a piecewise linear approximation to the logarithm?". The fast inverse square root is a fascinating topic, but I don't know enough about it to talk about it.
But it's actually pretty easy to answer the question in the title. First I'll start with the "TL;DR" summary:
This approximation works because when a number is an exact power of the base, its logarithm is (trivially) exactly its exponent when expressed in exponential notation. Floating-point notations like IEEE-754 are base-2 exponential notations, and the exponent occupies almost the highest-precision bits of the format, excepting only the sign bit, which is 0 for positive numbers and so can be ignored. And all of lower-precision bits (that is, beneath the exponent) form a perfectly linear progression, notionally from 000 to 999, due in part to the fact that IEEE-754 formats use normalized significands with a "hidden" leading 1 bit.
The longer explanation starts with a review: What do we mean by a logarithm? Well, it's the inverse of the exponential or power function. Exponentials say things like, ten to the second power (102) is 100. So the (base 10) logarithm function asks things like, What power do we have to raise 10 to to get the number 100? The answer, of course, is 2.
You can also ask things like, "What power do we have to raise 10 to to get the number 50?" That's obviously harder, because 50 isn't an even power of 10. There's some very cool math behind this which I am not going to try to explain, but raising things to fractional powers is defined in such a way that 101.69897 is 50. So the log (base 10) of 50 is 1.69897.
And then you don't have to limit yourself to powers of 10, either. Mathematicians like to use powers of the special number e (2.71828), and of course computer programmers love (just love) to use powers of 2. What power do we have to raise 2 to in order to get the number 32? Every computer geek knows it's 5. So the log (base 2) of 32 is 5.
And then there's scientific notation. Let's take the number 12345. One way of representing that in scientific notation is 1.2345 × 104. And it turns out that gives us a hint as to what the base-10 logarithm of 12345 is. It's going to be greater than or equal to 4, and less than 5. (In fact it's about 4.09149.) This works as long as we pick scientific representations that are normalized in a certain way, with the significand part (popularly called the "mantissa") chosen to be between 1 and 10. (That is, we wouldn't get a good estimate of the logarithm in this way if we'd looked at the otherwise-equivalent notations 0.12345 × 105 or 12.345 × 103.)
So now let's turn to computer floating-point representations. These typically use — and IEEE-754 definitely uses — base-2 (binary) exponential representations, that is, with normalized significand values multiplied by some power of 2. For example, the number 1.125 is represented as 1.125 × 20, the number 25.25 is represented as 1.578125 × 24, and the number 0.1875 is represented as 1.5 × 2-3.
And, just as for base 10, those exponents give us a rough, integer-only approximation of the base-2 logarithm — again, as long as we use normalized significands, in this case between 1 and 2. So log2(25.25) is going to be somewhere between 4 and 5.
It's also time for me to foreshadow a super-special secret fact implied by the fact that we're restricting ourselves to normalized significand values. I said they were going to be "between 1 and 2", but more precisely, they're going to range from 1.00000 to 1.99999. But notice that no matter what the significand value is, for base 2 the first digit is always going to be 1. In a minute we'll see what we can do with that.
The IEEE-754 binary floating-point representations all consist of a sign bit, followed by some number of exponent bits, followed by some number of significand bits. For example, for our number 25.25, we're going to have a 0 bit representing a positive sign, followed by some representation of the exponent 4, followed by some representation of the significand 1.578125.
But now we're getting a glimmer of how the raw-bits interpretation of a floating-point number might have something to do with a logarithm. For positive numbers we don't have to worry about the sign bit (since it's 0), and the next thing we see is going to be the exponent, and we've already seen that the exponent is a rough (integer-only) approximation of the logarithm. So all that's left to ask is, how are we going to interpolate between the power-of-two values that have exact base-2 logarithms (log2(8) = 3, log2(16) = 4, log2(32) = 5, etc.) and the fractional ones in between?
If I gave you evenly-spaced numbers like 100, 200, 300 and asked you to fill in the numbers in between, you'd have little trouble: you'd replace the 00 part by 01, 02, 03, ..., up to 99. And if you knew anything about binary numbers, and I asked you do do the same for evenly-spaced binary numbers like
00100000
01000000
01100000
10000000
you could do more or less the same thing, replacing the 00000
part by consecutive binary numbers 00001
, 00010
, 00011
, etc.
Remarkably enough, we're about to see that something exactly like this goes on when we encode significand values in IEEE-754 floating-point numbers.
Let's start looking at an actual IEEE-754 floating-point format. We'll use single-precision float32, to keep the numbers halfway reasonable. This format uses 1 bit for the sign (S), 8 bits for the exponent (e), and 23 bits for the significand (s):
S e e e e e e e e s s s s s s s s s s s s s s s s s s s s s s s
Let's start figuring out what the representations of the numbers 16.000 = 24 and 32.000 = 25 would look like.
16 is 1.0 × 24. So the significand is 1.0, and the exponent is 4. How will these be stored? I'm going to concentrate on the significand first. Remember the observation that the first digit (the one to the left of the decimal point) is always 1? IEEE-754 takes the position that this always-1 leading bit does not need to be stored at all. The actual bits stored for a significand of 1.0 will be 00000. The exponent will be some representation of 4. So I'm going to represent the number 16.000 like this:
0 4 4 4 4 4 4 4 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Those "4"s are approximate; I still haven't gotten around to explaining how the exponent is encoded.
The number 32.000 is similar — actually, it's identical, except for the exponent:
0 5 5 5 5 5 5 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
But the key point I'm building up to is that the numbers in between — those greater than 16.0, but less than 32.0, will all be variations on
0 4 4 4 4 4 4 4 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
except with the 0's replaced by all the consecutive 23-bit binary numbers — all 2^23 or 8388608 of them — from 000…001 up to 111…111.
So now we've got just about all the pieces in place to see how "the integer representation of a floating point number offers a piecewise linear approximation to the logarithm". We've shown that the binary representations of the numbers between 16 and 32 form a nice, linear ramp from a number that starts with something like 4, to a number that starts with something like 5.
I've been ducking the question of how the exponents are encoded. That's done using a bias scheme. For binary32, the bit pattern stored is the actual exponent value, plus 127. So our exponent value of 4 will be stored as a bit pattern of 4+127=131, and 5 will be stored as 132. So the actual encodings of 16.0 and 32.0 will be
0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
where I have filled in the binary values of the encoded exponents 131 and 132, namely
10000011
and 10000100
. And now we can see the actual, raw values
01000001100000000000000000000000
and
01000010000000000000000000000000
, or 0x41800000
and
0x42000000
, or 1098907648 and 1107296256.
Those numbers don't have anything obvious to do with the base-2 logarithms of 16 and 32 (namely, 4 and 5), but they're actually very close. They're offset by the effect of the bias, and they're completely wrong in magnitude. But that's not surprising: by taking 32-bit quantities and treating them as integers, we're getting big numbers (which are indeed integers), but the actual values for these logarithms should be 4.0 and 5.0, and we'd like the possibility of a fractional part. So if we're going to treat these big integers as some rendition of the logarithms of some other, smaller numbers, it's going to involve something very much like a fixed-point representation.
Deriving the details of that fixed-point representation is straightforward: the scaling factor is clearly 223 = 8388608, and there's also an offset, corresponding to the bias built in to the binary32 format. That offset will be 1065353216 (127 × 223) if applied before scaling, or 127 afterwards.
Let's see if this works. We have come up with integer values of 1098907648 and 1107296256 corresponding to 16.0 and 32.0. So:
(1098907648 - 1065353216) / 8388608 = 4
(1107296256 - 1065353216) / 8388608 = 5
or the other way:
1098907648 / 8388608 - 127 = 4
1107296256 / 8388608 - 127 = 5
So it's working!
It's said that a picture is worth 1,000 words, and it's long past time to do some of this explaining that way. Here's a graph which — not coincidentally — looks an awful lot like the one you posted from Mitchell 1961:
The red curve is the actual base-2 log function. The black dots are the places where log2(X) is an integer. Those are the points where the raw bits of a binary32 float value, interpreted as an integer and suitably scaled, are exactly equal to log2. And then in blue we have straight line segments directly connecting those points.
To recap: If we take the raw bits of binary32 float values corresponding to exact powers of 2, we get:
X | binary32 (hex) | binary32 (dec) | log2 |
---|---|---|---|
0.25 | 3e800000 |
1048576000 | -2 |
0.5 | 3f000000 |
1056964608 | -1 |
1 | 3f800000 |
1065353216 | 0 |
2 | 40000000 |
1073741824 | 1 |
4 | 40800000 |
1082130432 | 2 |
8 | 41000000 |
1090519040 | 3 |
16 | 41800000 |
1098907648 | 4 |
32 | 42000000 |
1107296256 | 5 |
An exact power of two has a significand of 1.0 (that is, these numbers are all 1.0 × 2N), meaning that they are encoded in IEEE-754 as all 0's (due to the "hidden 1 bit" property). And since they're positive, their sign bit is 0 also, and the only part that isn't 0 is the exponent. And since for exact powers the exponent is the logarithm, for these numbers, the integer interpretation of the raw bit pattern is always exactly the logarithm, after shifting and scaling. (I presume that some aspect of that "shifting and scaling" is where the magic number 0x5f37642f comes in.)
And then, for all the numbers that aren't exact powers of 2, and for which the logarithm isn't an integer, the way the significand values are laid out means that the raw values (as shown by the blue lines in the plot above) form a perfect linear interpolation between the exact powers. Therefore, in (almost) all cases, the raw bit patterns do indeed end up being a "piecewise linear approximation to the logarithm". (I said "almost" because this result does not hold for the subnormals, and for the special values Inf and NaN.)
The other interpretation of the question "Why" is, "Why did the designers of IEEE-754 set things up this way?" I don't believe it was deliberate, that is, I don't believe that it was a requirement that the raw bit pattern, interpreted as an integer, corresponded to a piecewise linear approximation of the base-2 logarithm. However, given these three related facts:
the "piecewise liner approximation" result falls out rather neatly.