mathfloating-pointbit-manipulationsimd

Fast bithacked log2 approximation


In order to estimate entropy of a sequence, I came up with something like this:

float log2_dirty(float x) {
    return std::bitcast<float>((std::bitcast<int>(x) >> 4) + 0x3D800000)) - 15.0f;
}

This is loosely based on the identity of log2(1+x) ~= x which over the interval of x=[0..1] has the maximum error of about 8%.

Also for log2(2^e * (1+a)) ~= e + a the idea now is to move parts of the exponent to a linear part, then adding a bias some range of the original value x reasonably closely matches the true value as in

            appr    std::log2
x = 2.0        1            1
x = 3.0      1.5      1.58496
x = 4.0        2            2
x = 5.0     2.25      2.32193
x = 6.0      2.5      2.58496
...
x = 29.0  4.8125      4.85798
x = 30.0   4.875      4.90689
x = 31.0  4.9375       4.9542

Other parametrisation would be x >> 5 + 0x40000000 in integer domain, then -31.0f in fp domain.

The ranges where this works is between 2 <= x < 32.f or 2.0 <= x < 64.f, but I'm wondering if and how I could include also the range 1..2 or even 1/16 <= x < 32 with as little tinkering as possible.

The best I was thinking was return log2_dirty(x * 16) - 4.0f for 1/16 <= x < 2 range, but maybe something better is available (also targeting SIMD and branchless programming).

One additional target is to compute n log2 n, as that is used for entropy computation (And here I'm actually not using p log2 p, p<=1.0 but positive values n which if necessary I can clamp to 1.0 and which in my case is also limited to x < 31).

TLDR; how to best extend the applicable domain of the algorithm -- or maybe there's another reasonable variant just like the traditional fast_exp algorithm as in https://stackoverflow.com/a/78509807/1716339


Solution

  • On x86-architectures with native FMA support, I guess the fastest way is

    return static_cast<float>(std::bit_cast<int>(x))*(1.f/(1<<23))- 127.f;
    

    You get slightly less bit-loss (i.e., a more monotonic behavior), if you first subtract the offset in the integer domain and scale afterwards:

    return static_cast<float>(std::bit_cast<int>(x)-0x3f800000)*(1.f/(1<<23));
    

    If you want to avoid the static_cast, your idea is generally sound. And by scaling the input value you can move the window of valid values. Notice though that scaling by a power of two is just adding/subtracting from the exponent (as long as you don't leave the range of normalized numbers). Thus by subtracting your lowest possible input value, you get an exponent in the range [0, Emax-Emin], and if Emax-Emin is 2^N, you can shift by N to get the reduced exponent into the significant of the floating point representation, after which you add (still in the integer domain) the float representation of the corresponding power of two, bitcast back to float and subtract (as a float) (2^N)-Emin.

    Example, N=3, Emin=-3, Emax=5 (i.e., valid between 1.f/8 and 32.f):

    return std::bit_cast<float>(
           ((std::bit_cast<int32_t>(x)-std::bit_cast<int32_t>(1.f/8)) >>3)
                 +std::bit_cast<int32_t>(8.f))
           - (8.f - -3.f);
    

    Now by moving the subtraction after the shift, you get an expression exactly in your desired form:

    return std::bit_cast<float>(
          (std::bit_cast<int32_t>(x) >>3)
            +(std::bit_cast<int32_t>(8.f) - (std::bit_cast<int32_t>(1.f/8) >> 3)))
            - (8.f - -3.f);
    

    If you shift by 4 instead of 3, you can change the constants to have range of 16 in the exponent, e.g., -8 to 8, i.e. 1/256 to 256.

    N.B.: Since the approximation is always too small, except for powers of two, you could add something around 0.043 to reduce the maximal absolute error.