Parsing LUKS Headers to Read Integer Value Fields Correctly

I'm trying to parse a luks header by reading the raw data off a device with a luks volume installed to it, following the specification given here: https://gitlab.com/cryptsetup/cryptsetup/wikis/LUKS-standard/on-disk-format.pdf, specifically page 6 with the table showing the data that resides at each location, what type of data it is and for how many of those data types there are for a single value.

For instance, the hash-spec string resides at location 72 and contains 32 type char bytes. Collecting this into an array and printing the result is simple, however as detailed in the table for numerical values such as the version or the key-bytes (which is supposedly the length of the key), these values span over multiple integers. The version has two unsigned shorts and the key-bytes has four unsigned ints to represent their values.

I'm somewhat confused by this, and how I should go about interpreting it to retrieve the correct value. I wrote a messy test script to scan through a usb stick encrypted with luks and display what's retrieved from reading these fields.

256
25953
hash spec:
sha256
key bytes (length):
1073741824
3303950314
1405855026
1284286704

This is very confusing, as again the hash spec field holds an expected value, just the string of characters themselves, but how am I supposed to interpreter either the version or key-byte fields? These both seem like completely random numbers, and from what I can tell there isn't anything in the spec that explains this. I figured then this might be a problem with how I'm actually writing the code to do this, below is the script used to display these values:

#include <stdio.h>

int main()  {

    unsigned short data[100];
    unsigned char data2[100];
    unsigned int data3[100];
    int i;

    FILE *fp;

    fp = fopen("/dev/sdd1", "rb");

    fseek(fp, 6, SEEK_SET);

    if (fp) {
        for (i=0; i < 2; i++)   {
            fread(&data[i], sizeof(short), 1, fp);
        }

        fseek(fp, 72, SEEK_SET);

        for (i=0; i < 32; i++)  {
            fread(&data2[i], sizeof(char), 1, fp);
        }

        fseek(fp, 108, SEEK_SET);

        for (i=0; i < 4; i++)   {
            fread(&data3[i], sizeof(int), 1, fp);
        }

        printf("version:\n");
        for (i=0; i < 2; i++)   {
            printf("%u\n", data[i]);
        }
        printf("hash spec:\n");
        for (i=0; i < 32; i++)  {
            printf("%c", data2[i]);
        }
        printf("\n");
        printf("key bytes (length):\n");
        for(i=0; i < 4; i++)    {
            printf("%u\n", data3[i]);
        }

        fclose(fp);
    }
    else {
        printf("error\n");
    }

    return 0;
}

Any help would be appreciated, thanks.

Solution

The problem is the data you're reading is big-endian, but the computer you're running on is little-endian. For example, the bytes you're printing out as 1073741824 are 0x00, 0x00, 0x00, and 0x40, in that order. As a big-endian number, that's 0x00000040, or 64. As a little-endian number, as is usually used on x86 systems, that's 0x40000000, an absurdly long length.

Fortunately, there are functions that can convert these values for you. To convert from a 32-bit big-endian (network byte order) to your system's (host byte order) format, use ntohl, and for a 16-bit integer, use ntohs.

So when you read the data for the 16-bit integers, it would look like this:

for (i=0; i < 2; i++)   {
    fread(&data[i], sizeof(short), 1, fp);
    data[i] = ntohs(data[i]);
}

As a side note, if you're going to be working with values of fixed sizes, it's a little more portable and easier to understand if you do #include <stdint.h> and then use the types uint8_t, uint16_t, and uint32_t. These will always be the right size, since the built-in types can vary between platforms.

If you're interested in reading more about endianness, Wikipedia has an article on it.