In the MIX computer a word is composed of five bytes and a sign. How is the sign represented in memory? Is it another byte so each word is six bytes really?
Thanks.
Your question is not quite clear. The architecture specification doesn't specify an actual implementation. It only specifies the observable behavior.
The important thing is that in MIX access to memory is aligned to words. In some other architectures like x86 you can read a word starting from an arbitrary address even non-word-aligned but not in MIX. It means that you can't access a "sign" in any other way than as the sign of the corresponding word. It in turn means that if someone wanted to implement a MIX in hardware, it would be enough to use just 31-bit for every word i.e. 1 bit for the sign + 5 "bytes" of 6 bits.
If you want to emulate MIX on a standard modern hardware that uses "bytes" that are multiply of 8 bits, you have a few choices:
Obviously, there are more more contrived options.