I have been introduced to DAA instruction in our 8085 microprocessor course. It adds 06H to lower nibble of accumulator if it is an invalid BCD or if auxiliary carry is set, and adds 60H to upper nibble if it is an invalid BCD or if carry flag is set.
I was interested in finding out the contents of the carry flag before and after DAA operation. What I really wanted to know was how the addition operation within DAA influenced the carry flag. So I wrote following program:
mvi c,00h
mvi a,99h
mvi b,99h
add b
daa
jnc L1
inr c
L1: mov a,c
sta 8100h
hlt
I expected the contents of 8100H to be 00H. CY should be set after add b (A = 32H), but DAA adds 66H without producing a carry (A = 98H) which should reset the carry flag.
However, I found out contents of 8100H to be 01H, which meant that DAA preserved the carry flag.
Next I did the same experiment adding 55H to 99H. This time there was no carry before DAA, but after DAA the carry flag was set (8100H had 01H). It does make sense because the result should be 154H. But DAA did not preserve the carry flag this time.
So, I deliberately added FFH + FFH to see what would happen if there was a carry before DAA and a carry was produced during DAA. I observed that carry flag was set (contents of 8100h = 01h).
I made a truth table out of my findings:
| Carry before | DAA Carry | Carry Preserved? |
|---|---|---|
| No | No | Yes |
| No | Yes | No |
| Yes | No | Yes |
| Yes | Yes | Yes |
So DAA always preserves the carry except if it generates a carry of its own and carry before was 0? It means the addition operation within DAA works differently, which confuses me. Does anyone know why it works this way?
DAA sets CF if the preceding add had a carry-out, and/or if the accumulator is > 99h so normalizing BCD digits needs to propagate a 1 into a higher place-value. (e.g. if the high nibble is 0xb = 11, it becomes 1 and CF=1.) Otherwise CF=0 after DAA.
The add can set CF in case of something like BCD 90 + 90 represented as binary 0x90 + 0x90 = 0x120 truncated to accumulator = 0x20, CF=1 telling us there was binary wraparound in the high digit (2 because 18 doesn't fit). We want DAA to produce 0x80 with carry-out of 1 because the decimal sum we're computing is 90+90 = 180.
daa is intended for use after add so I'm only going to talk about that case. It does of course have well-defined rules for what happens, so you can figure out what it does even if there are combinations of accumulator value and FLAGS which add can't produce, especially with only normalized BCD inputs (at most 0x99, not e.g. 0xAB + 0xCD).
8086 has the same instruction1; https://www.felixcloutier.com/x86/daa#operation is detailed pseudocode for it from Intel's manual which includes how it reads CF (and AF and the accumulator) as inputs, and writes them as outputs.
8085's A register maps to 8086 AL. CF and AF are the Carry and Aux-carry flags respectively; it reads both as inputs. Your truth table omits that and the accumulator input value. (AF is set by add if there's a carry from bit #3 to bit #4, across the low/high nibble boundary.)
old_AL := AL;
old_CF := CF;
CF := 0;
IF (((AL AND 0FH) > 9) or AF = 1) // normalize low BCD digit in low nibble
THEN
AL := AL + 6; // emulate wrapping at 10 instead of 16
CF := old_CF or (Carry from AL := AL + 6); // this seems to be useless; the next if/else doesn't read this, and both sides unconditionally overwrite CF
AF := 1;
ELSE
AF := 0;
FI;
IF ((old_AL > 99H) or (old_CF = 1)) // Then fixup the high nibble digit.
THEN
AL := AL + 60H;
CF := 1;
ELSE
CF := 0;
FI;
old_AL > 99h catches cases like 46h + 56h = 9Ch where the low-nibble fixup produces 0A2h which needs a high-nibble fixup. So it's like checking the result of that add, but without a data dependency.
For example: mov al, 0x46 ; add al, 0x56 produces AL = 0x9C with AF=CF=0; daa produces AL = 0x02 with CF=1 (and AF=1 to indicate it fixed up the low digit).
It got there by adding 6, producing 0xa2, then adding 0x60 producing 0x02.
AMD documents it in words in their vol.3 AMD64 Architecture Programmer’s Manual, not pseudocode. The logic for deciding what to add is the same as Intel's pseudocode. They explain the FLAGS-setting differently, though.
DAA Decimal Adjust after Addition
Adjusts the value in the AL register into a packed BCD result and sets the CF and AF flags in the rFLAGS register to indicate a decimal carry out of either nibble of AL.
Use this instruction to adjust the result of a byte ADD instruction that performed the binary addition of one 2-digit packed BCD values to another.
The instruction performs the adjustment by adding 06h to AL if the lower nibble is greater than 9 or if AF = 1. Then 60h is added to AL if the original AL was greater than 99h or if CF = 1.
If the lower nibble of AL was adjusted, the AF flag is set to 1. Otherwise AF is not modified. If the upper nibble of AL was adjusted, the CF flag is set to 1. Otherwise, CF is not modified. SF, ZF, and PF are set according to the final value of AL.
(This "unmodified" language differs from Intel's pseudocode, which sets CF to 0 if the upper nibble wasn't fixed up. But that can only happen if the incoming CF was already 0, so it's the same as leaving it unmodified.)
Footnote 1: The 8086 ISA was designed to allow automatic translation from 8080 / 8085 asm source, so it's very likely that the instruction has identical semantics on both.
https://www.righto.com/2013/08/reverse-engineering-8085s-decimal.html is Ken Shirriff's reverse-engineering of how DAA is physically implemented in 8085:
The DAA operation in the 8085 is implemented by several components: a signal if the lower bits of the accumulator are ≥ 10, a signal if the upper bits are ≥ 10 (including any half carry from the lower bits), and circuits to load the ACT register with the proper correction constant 0x00, 0x06, 0x60, or 0x66. The DAA operation then simply uses the ALU to add the proper correction constant.
I wouldn't be surprised if some x86 CPUs internally figure out what constant to add and only do one addition; the pseudocode isn't the implementation, just an as-if equivalent. It was a single-uop instruction until Sandybridge, then a 3 uop instruction. (4 in Pentium 4). AMD microcodes it as 16 uops in Bulldozer/Zen families, up from 12 in K8/K10, presumably having no specialized hardware support for it at all.