cfileserializationutf-8ascii

ASCII file encoding with 16-bit bytes


According to the C standard, a byte can have more than 8 bits. How is an ASCII (or UTF-8) file encoded on systems with, e.g., 16-bit bytes since ASCII characters take up 8 (technically 7) bits? Does each character take up 16 bits, or are two characters concatenated into one byte?

For example, given the following code:

char character;
FILE* file = fopen("file.txt", "r");
fread(&character, 1, 1, file);

If the file is ASCII-encoded and contains the text ab, does character contain 'a' or some concatenation of 'a' and 'b'?


Solution

  • Just because some oddball system has 16 bit bytes, that doesn't change the definition of ASCII/UTF8. So they will either have to pad the superfluous bits with zeros or maybe utilize a 16 bit symbol table in case that's meaningful. Storing two values in one 16 bit variable sounds plain problematic.

    Mostly systems like this are DSPs and the like, so in most cases there might not be a lot in the way of user interfaces and in that case no need for string handling either.