mathfloating-pointbinary

Representation of fixed point numbers


I want to understand what is the representation of numbers in binary format, in which the maximum number is 1 and in binary format (with a width of 8 bits) this is 0_1111111. The minimum number is -1 and in binary it is 1_0000000. It is clear that the highest bit is the sign of a number, and 7 ones means that the number 1 is the maximum, and 7 zeros means that -1 is the minimum. Most likely, this is a normalized floating-point representation, but I can't adapt the numbers according to the normalization rules.How, for example, should I represent the number 0.02 in this case? Thanks.


Solution

  • The requirements stated in the post do not match any common format. Here are three of the least unlikely possibilities.

    Linear: A number x is represented with the eight-bit two‘s complement numeral for ½(255x−1):

    Issues with this include: Zero is not representable. (x = 0 ➝ ½(255•0 − 1) = −½.) Linear representations with a non-zero offset are rarely used as a general format. One might be used as a specialized solution for a particular problem.

    One’s complement fixed-point: A number x is represented with the eight-bit one’s complement numeral for 127x:

    In other words, the point is fixed after the first bit and before the last seven bits.

    An issue with this is that one’s complement is rarely used. This might be encountered in an academic exercise but rarely in modern practice.

    Incorrect specification: A number x is represented with the eight-bit two’s complement numeral for 128x, and the post was incorrect to state that 0111 1111 represents 1. Instead, it represents 127/128, a number near one.

    Obviously, an issue with this is it contradicts the post. However, “off by one” errors occur commonly in computing, and it is not uncommon to encounter situations where we want to span some aesthetic interval like [−1, 1] but there are boundary issues. Were it not for this issue, this would be a clean solution.

    In any case, there are no signs this is a floating-point format.

    .2 does not have an exact representation in any of the formats above; rounding would be necessary.