I am trying to implement a fast float = log2(float). I wrote a simple C program to compare my results with log2 and I get a small error that I can't find a source for.
I am taking the approach of identifying the floating point representation (ignoring sign bit) is 2^(exponent) * 1.significand
. Using properties of logs I get log2(float) = exp + log2(1.significand)
. Eventually I'll truncate the significand for a table lookup, but for now I wanted to verify the correct result.
For further background reading on inspiration for this: http://www.icsi.berkeley.edu/pubs/techreports/TR-07-002.pdf
Herein lies the problem. This is a simple program that extracts floating point bits and adds the exponent to the log2(significand).
#include <math.h>
#include <stdint.h>
#include <stdio.h>
int main()
{
typedef union {
int32_t i;
float f;
} poly32_t;
float x = 31415926535.8;
poly32_t one;
one.f = 1.0f;
uint32_t ii;
uint32_t num_iter = 15;
for(ii=0; ii < num_iter; ++ii) {
poly32_t poly_x;
poly32_t poly_x_exponent;
poly32_t poly_x_significand;
// extract the exponent and significand
poly_x.f = x;
poly_x_significand.i = (0x007fffff & poly_x.i);
poly_x_exponent.i = (0xff & (poly_x.i >> 23) ) - 127;
// recover the hidden 1 of significand
poly_x_significand.f = 1.0 + ((float)poly_x_significand.i)/10000000;
// log2(2^exp * sig) = exponent + log2(significand)
float log_sig = log2(poly_x_significand.f);
float y_approx = (float)poly_x_exponent.i + log_sig;
// Get the actual value
float y_math = log2(x);
printf("math::log2(%16.4f)=%8.4f ; approx=%8.4f ; diff=%.4f\n",
x, y_math, y_approx, y_math-y_approx);
x *= 0.1;
}
return 1;
}
math::log2(31415926784.0000)= 34.8708 ; approx= 34.7614 ; diff=0.1094
math::log2( 3141592576.0000)= 31.5488 ; approx= 31.4733 ; diff=0.0755
math::log2( 314159264.0000)= 28.2269 ; approx= 28.1927 ; diff=0.0342
math::log2( 31415926.0000)= 24.9050 ; approx= 24.7924 ; diff=0.1126
math::log2( 3141592.5000)= 21.5831 ; approx= 21.5036 ; diff=0.0794
math::log2( 314159.2500)= 18.2611 ; approx= 18.2221 ; diff=0.0390
math::log2( 31415.9258)= 14.9392 ; approx= 14.8235 ; diff=0.1158
math::log2( 3141.5925)= 11.6173 ; approx= 11.5340 ; diff=0.0833
math::log2( 314.1592)= 8.2954 ; approx= 8.2517 ; diff=0.0437
math::log2( 31.4159)= 4.9734 ; approx= 4.8546 ; diff=0.1188
math::log2( 3.1416)= 1.6515 ; approx= 1.5644 ; diff=0.0871
math::log2( 0.3142)= -1.6704 ; approx= -1.7187 ; diff=0.0483
math::log2( 0.0314)= -4.9924 ; approx= -4.9936 ; diff=0.0012
math::log2( 0.0031)= -8.3143 ; approx= -8.4050 ; diff=0.0907
math::log2( 0.0003)=-11.6362 ; approx=-11.6890 ; diff=0.0528
The "approximation" should be exactly the same as the clib's log2 at this point; any help identifying my error would be greatly appreciated.
Edited (2) to remove incorrect information about the number of significand bits
Furthermore, I think this is wrong:
// recover the hidden 1 of significand
poly_x_significand.f = 1.0 + ((float)poly_x_significand.i)/10000000;
The (explicit part of the) significand is a binary fraction, not a decimal one. You should be dividing by (float) (1 << 23)
.
Note, too, that it doesn't look like your implementation deals correctly with subnormal numbers.