arrayshashcryptographysrp-protocol

Why dropping leading all zeros byte in a java byte array before hashing


This question is about operations that are being done to the byte arrays before they are being hashed in java.

I am trying understand why in multiple srp crypto libraries the leading zero byte (in case there is one) is being dropped before it is being hashed.

for example: this is from Bouncy Castle

/**
 * Return the passed in value as an unsigned byte array.
 *
 * @param value value to be converted.
 * @return a byte array without a leading zero byte if present in the signed encoding.
 */
public static byte[] asUnsignedByteArray(int length, BigInteger value)
{
    byte[] bytes = value.toByteArray();
    if (bytes.length == length)
    {
        return bytes;
    }

    int start = bytes[0] == 0 ? 1 : 0;
    int count = bytes.length - start;

    if (count > length)
    {
        throw new IllegalArgumentException("standard length exceeded for value");
    }

    byte[] tmp = new byte[length];
    System.arraycopy(bytes, start, tmp, tmp.length - count, count);
    return tmp;
}

or this is from nimbus SRP:

public static byte[] toUnsignedByteArray(final BigInteger bigInteger) {

    byte[] bytes = bigInteger.toByteArray();
    byte[] result = toUnsignedByteArray(bytes);

    // remove leading zero if any
    if (bytes[0] == 0) {

        byte[] tmp = new byte[bytes.length - 1];

        System.arraycopy(bytes, 1, tmp, 0, tmp.length);

        return tmp;
    }
    return bytes;
}

Bought examples basically drop that leading zero. The methods from those libraries are call "toUnsignedByteArray", although i do not understand why dropping the leading zero will, make the byte array unsigned. I.e. it only dropping the zero byte, then the next byte could be negative i.e. the next byte becomes the leftmost byte (in Big Indian ) and the leftmost bit in the byte is the sign bit which could be set or unset depending on the byte, So if i understand the structure of the byte array correctly those methods should not be called to "toUnsignedByteArray" in the first place. However the most important question is why we need to drop that zero byte in case it is all zeros

Here is an example of the test vector from srp rfc 5054 appendix A. We compute U from A and B. Where the zero byte of B happens to be all zeros in binary i.e. if we print the B as an byte array we will get the following values

public static final B = new BigInteger("BD0C61512C692C0CB6D041FA01BB152D4916A1E77AF46AE105393011BAF38964DC46A0670DD125B95A981652236F99D9B681CBF87837EC996C6DA04453728610D0C6DDB58B318885D7D82C7F8DEB75CE7BD4FBAA37089E6F9C6059F388838E7A00030B331EB76840910440B1B27AAEAEEB4012B7D7665238A8E3FB004B117B58", 16);

[0, -67, 12, 97, 81, 44, 105, 44, 12, -74, -48, 65, -6, 1, -69, 21, 45, 73, 22, -95, -25, 122, -12, 106, -31, 5, 57, 48, 17, -70, -13, -119, 100, -36, 70, -96, 103, 13, -47, 37, -71, 90, -104, 22, 82, 35, 111, -103, -39, -74, -127, -53, -8, 120, 55, -20, -103, 108, 109, -96, 68, 83, 114, -122, 16, -48, -58, -35, -75, -117, 49, -120, -123, -41, -40, 44, 127, -115, -21, 117, -50, 123, -44, -5, -86, 55, 8, -98, 111, -100, 96, 89, -13, -120, -125, -114, 122, 0, 3, 11, 51, 30, -73, 104, 64, -111, 4, 64, -79, -78, 122, -82, -82, -21, 64, 18, -73, -41, 102, 82, 56, -88, -29, -5, 0, 75, 17, 123, 88]

Byte Zero printed in binary: 00000000

Now i understand that for some reason dropping that byte is important (although I am not sure) What i mean is since those test vectors compute correctly with those two libraries it should be correctly programmed right? However i do not understand why we need to drop that leading zero byte. What is the problem with it. If I drop that leading zeor byte and try to create another BigInteger from the byte array without the leading zero byte, then i will get a totally different number in this case even negative. So dropping that zero byte does not make any scene to me. Any explanation is welcomed.


Solution

  • The "unsigned" in the name is perhaps a bit misleading; it's not the dropping of the 0 byte which makes it unsigned, it's just assuming that the BigInteger contains an unsigned number.

    The 0 byte being dropped in these cases doesn't change the value, just as 01 or 001 are the same value as 1.

    It will be important to drop the zero for various reasons:

    1. Not wasting space with unnecessary 0-bytes.
    2. Making the representation consistent when doing comparisons of the byte arrays.
    3. (And most relevant in the contexts you're referring to) the hash of a byte array with an extra 0 in front will not be the same as the hash of a byte array without the extra 0. The hash function doesn't know after all that this is a number and that the 0 is meaningless in this case. Imagine if this were a file, with the bytes 0:1:2:3 vs a file with the bytes 1:2:3. You wouldn't expect the hash of files with different lengths to be the same.

    Note also that whether 0-bytes are to be removed from the start or the end depends on the endianness of the integer representation.

    UPDATE: Clarification of the removal of 0 bytes:

    Whilst removing a 0-byte from the start or end of any old byte array would change the value, in the cases you're referring to we're talking about the representation of an integer. If the 0-byte has importance, e.g. you want to round-trip some binary data, it would not be appropriate to load that binary data into a BigInteger class. I refer to my original example, you wouldn't consider 1 and 01 to be different numbers would you (though you would consider them to be different strings)?

    UPDATE: Clarification on endianness:

    Integers may be represented in different ways in memory. If you saw the number 20 (in ordinary decimal), you know that the 2 refers to the number of tens, but that is just a convention. We might write twenty backwards as 02 and put the largest units at the end of the number. Similarly in a computer, the order of the digits can be the way we are normally familiar with them, or they can be "backwards". Given that, the 0s which don't affect the value of the number may be either at the start or at the end of the array of bytes, and we have to know when dealing with an array of bytes which way round the bytes should be "read".