c++floating-pointbit-manipulation

How to write an std::floor function from scratch


I would like to know how to write my own floor function to round a float down.

Is it possible to do this by setting the bits of a float that represent the numbers after the comma to 0?

If yes, then how can I access and modify those bits?

Thanks.


Solution

  • You can do bit twiddling on floating point numbers, but getting it right depends on knowing exactly what the floating point binary representation is. For most machines these days its IEEE-754, which is reasonably straight-forward. For example IEEE-754 32-bit floats have 1 sign bit, 8 exponent bits, and 23 mantissa bits, so you can use shifts and masks to extract those fields and do things with them. So doing trunc (round to integer towards 0) is pretty easy:

    float trunc(float x) {
        union {
            float    f;
            uint32_t i;
        } val;
        val.f = x;
        int exponent = (val.i >> 23) & 0xff; // extract the exponent field;
        int fractional_bits = 127 + 23 - exponent;
        if (fractional_bits > 23) // abs(x) < 1.0
            return 0.0;
        if (fractional_bits > 0)
            val.i &= ~((1U << fractional_bits) - 1);
        return val.f;
    }
    

    First, we extract the exponent field, and use that to calculate how many bits after the decimal point are present in the number. If there are more than the size of the mantissa, then we just return 0. Otherwise, if there's at least 1, we mask off (clear) that many low bits. Pretty simple. We're ignoring denormal, NaN, and infinity here, but that works out ok, as they have exponents of all 0s or all 1s, which means we end up converting denorms to 0 (they get caught in the first if, along with small normal numbers), and leaving NaN/Inf unchanged.

    To do a floor, you'd also need to look at the sign, and rounds negative numbers 'up' towards negative infinity.

    Note that this is almost certainly slower than using dedicated floating point intructions, so this sort of thing is really only useful if you need to use floating point numbers on hardware that has no native floating point support. Or if you just want to play around and learn how these things work at a low level.