binaryieee-754ieeehalf-precision-float

Half-precision floating-point


I have a small question about Half-precision IEEE-754.

1) I have the following exercise: 13,7625 shall be written in 16 bit (half precision)

so I started to convert the number from DEC to Binary and I got this 13,7625 = 1101.11000011002

all in all, it would be 1.1011100001100 * 2³.

My professor gave us the solution and as I know I did the mantissa quite right and the binary conversion as well for but for the Exponent, he states that it's 19=10011 but I don't get it. can the bais be 16? according to Wikipedia its - 15 for the half-precision. - 127 for the single-precision. - 1032 for the double-precision.

can you pls point out what did I do wrong pls?.

2)one other question what would be the exponent bias if we have the following situation: 1 sign bit + 4 Mantissa bits + 3 exponent bits. and why?

thanks.


Solution

  • 1) I have the following exercise: 13,7625 shall be written in 16 bit (half precision)

    so I started to convert the number from DEC to Binary and I got this 13,7625 = 1101.11000011002

    You mantisssa conversion is correct and so is your exponent. Exponent bias for half precision is 15 https://en.wikipedia.org/wiki/Half-precision_floating-point_format

    one other question what would be the exponent bias if we have the following situation: 1 sign bit + 4 Mantissa bits + 3 exponent bits. and why?

    The rules for IEEE-754 FP coding is that, if exponent is coded with n bits, bias is 2n-1-1. This is applied for simple precision (8b/bias 27-1=127), double (11b/ 210-1=1023 bias (and not 1032, there is a small typo in question)), etc.
    For an exponent field of 3 bits, this gives a bias of 22-1=3

    For your coding problem, this would give an exponent code of 3+3=6=110. For the mantissa, it depends on the rounding policy. if mantissa is rounded towards 0, we can code 1.1011(100001100) by just dropping the trailing bits and the final code would be
    0.110.1011.

    But the rounding error is slightly superior to 0.5 ULP (precisely 0.1000011 ULP) and to minimize it, 1.10111000011 should be rounded on 4 bits by adding 1 to the ULP.

      1.1011 
    +      1
    = 1.1100
    

    and the final code would be 0.110.1100