assemblyendiannesscpu-registers

Understanding Byte Order and Register Allocation: Little Endian vs Big Endian


I've been learning about big and little endian architectures and found that my PC operates with little endian. Using a simple program, I loaded the value 0xA000 into the ax register and observed that A0 was stored in AH and 00 in AL. My understanding is that AH represents the higher address (and that's why the MSB stored in it), aligning with the register sequence (AL, AH, EAX, RAX). Can someone confirm if my understanding is correct so far?

Expanding on this, I'm intrigued by what would occur in a big endian system. Assuming I loaded again the value 0xA000 into AX, now A0 is stored in AL (becouse in big endian MSB goes to the lower andrees).

And now comes my problem:

if I were to add A into AX, i would see A00A inside.

However, if I were to add A to AL, and then inspect AX again, would I unexpectedly see AA instead of the expected value A00A? Could someone clarify the behavior in big endian systems and how they handle register allocation differently from little endian systems?


Solution

  • Assuming I loaded again the value 0xA000 into AX, now A0 is stored in AL (because in big endian MSB goes to the lower address).

    First, in your scenario there are no addresses, only values.

    Since 0xA000 is a value and we're not talking about memory addresses, then when A000 is loaded into AX, the value A000 must be there, full stop.

    Second, your hypothetical big endian system would be counter to the notion of AH, AL, AX.  AH's intended use is to store the higher order byte of AX, while AL's intended use it to store the lower order byte of AX.

    By definition, the value in AX = (AH << 8) + AL, and at the same time, AH = AX>>8 while AL = AX&255.

    In your scenario, you're swapping the usages of AH and AL in the above formulas, but this is not what big vs. little endian does.

    Endian swaps only the memory storage order of the bytes of multi-byte words.  Formulas for that refer to addresses as follows (where word = the value of a 2 byte item, and mem is some memory address, and mem[n] is a single 8-bit byte of memory):

    word = mem[0] + (mem[1]<<8), and mem[0] = word&255, mem[1] = word<<8.

    If you swap 0 and 1 in the above formula, you have big endian.  Big endian (vs. little endian) swaps the memory storage order, but when in registers values are just values.

    However, if I were to add A to AL, and then inspect AX again, would I unexpectedly see AA instead of the expected value A00A?

    No, AL is the low order byte of AX.  AL is not the lowest address of AX.  Memory has addresses, which are ordered — while registers have encodings instead, and those register encodings cannot be captured, passed as parameters, compared in value, and used/dereferenced like addresses can.  So there is no notion of lowest address among registers, since registers don't have addresses.  (It would be pointless to give order/ordering to the encodings since encodings cannot be used like addresses, and it is only with ordering of the storage of individual bytes that Endian comes into play.)

    Could someone clarify the behavior in big endian systems and how they handle register allocation differently from little endian systems?

    Best to think of registers as holding simple values unaffected by Endian, and when there are aliases (e.g. AL, AH, AX), then use the register formulas I've stated above (not the memory formulas).