I'm reading this guide about network programming, which I'm liking a lot: https://beej.us/guide/bgnet/html/split/slightly-advanced-techniques.html#serialization
I'm confused about something though. In this section about serialization, he talks about serializing ints for byte-ordering reasons, which makes sense to me, but he also includes these two functions pack754 and unpack754 for serializing floats in IEEE-754 format.
uint64_t pack754(long double f, unsigned bits, unsigned expbits)
{
long double fnorm;
int shift;
long long sign, exp, significand;
unsigned significandbits = bits - expbits - 1; // -1 for sign bit
if (f == 0.0) return 0; // get this special case out of the way
// check sign and begin normalization
if (f < 0) { sign = 1; fnorm = -f; }
else { sign = 0; fnorm = f; }
// get the normalized form of f and track the exponent
shift = 0;
while(fnorm >= 2.0) { fnorm /= 2.0; shift++; }
while(fnorm < 1.0) { fnorm *= 2.0; shift--; }
fnorm = fnorm - 1.0;
// calculate the binary form (non-float) of the significand data
significand = fnorm * ((1LL<<significandbits) + 0.5f);
// get the biased exponent
exp = shift + ((1<<(expbits-1)) - 1); // shift + bias
// return the final answer
return (sign<<(bits-1)) | (exp<<(bits-expbits-1)) | significand;
}
long double unpack754(uint64_t i, unsigned bits, unsigned expbits)
{
long double result;
long long shift;
unsigned bias;
unsigned significandbits = bits - expbits - 1; // -1 for sign bit
if (i == 0) return 0.0;
// pull the significand
result = (i&((1LL<<significandbits)-1)); // mask
result /= (1LL<<significandbits); // convert back to float
result += 1.0f; // add the one back on
// deal with the exponent
bias = (1<<(expbits-1)) - 1;
shift = ((i>>significandbits)&((1LL<<expbits)-1)) - bias;
while(shift > 0) { result *= 2.0; shift--; }
while(shift < 0) { result /= 2.0; shift++; }
// sign it
result *= (i>>(bits-1))&1? -1.0: 1.0;
return result;
}
What I'm confused about is that these functions work by looking at the first bit for the sign, then the next X bits for the exponent, then the next Y bits for the mantissa. So doesn't that mean the float has to already be in IEEE-754 format on the host machine for this to work?
Is this just here to explain the format, or is this something you would actually do in real life?
Is Serializing Floats Necessary for Cross-Platform Network Code?
Yes. FP encoding has many variations across implementations including variations is size, endian, precision ,exponent range, sub-normal support (and possible even base).
So doesn't that mean the float has to already be in IEEE-754 format on the host machine for this to work?
No, the pack/unpack will "work" (see following problems) even if long double
is not IEEE.
Is this just here to explain the format, or is this something you would actually do in real life?
Looks like learner code. I would not use the provided pack/unpack code, given its weaknesses (below) and especially the 2 very inefficient while
loops. Loops may iterate thousands of times with binary128.
The code is a hole-riddled attempt to pack an arbitrary encoded long double
into an IEEE binary64. It fails for values near 0.0, rounding, handle overflow and infinity/NAN well.
pack754()
has at least these short-comings:
if (f == 0.0) return 0;
loses information during serialization as it returns 0 for both +0.0 and -0.0. When testing the FP sign bit, do not use if (f < 0)
, but if (signbit(f))
to well extract the sign bit even if f
is zero or NAN.
long double
may be more than 64 bits so uint64_t pack754(long double f, unsigned bits, unsigned expbits)
loses info in trying to pack into 64-bits. I suppose OP is tolerating this info loss.
1LL<<significandbits
is UB on overflow (significandbits >= 63
). 1ULL<<significandbits
has some advantage, yet overflow (significandbits >= 64
) remains a problem.
Using float
math with the later long double
math is short sighted. ((1LL<<significandbits) + 0.5L)
makes a little more sense.
Rather than while(fnorm >= 2.0)
like code, use long double frexpl(long double value, int *p)
to extract a normalized value and exponent. Use long double ldexpl(long double x, int p)
to re-combine. while(fnorm >= 2.0) { fnorm /= 2.0; shift++; }
risks an infinite loop when fnorm
is infinity.
+ 0.5f
for rounding has many corners issues. Better to use lround()
and friends.
...
For simple cross platform exchange of FP values, I'd consider sprintf(buf, "%La", x)
as a first step to pack and strtold()
to unpack.
Packing a FP into a tight intN_t
and maintaining precision/range faithfulness across many computer implementations are competing goals.
Which is more important: faithful conversions or small packet size?
Most systems I've worked with prize faithful conversions over small packet size.
Packing a long double
, for portability, into a 64-bit is simply an unwise design.