clang / gcc : Some inline assembly operands can be satisfied with multiple constraints, e.g., "rm"
, when an operand can be satisfied with a register or memory location. As an example, the 64 x 64 = 128 bit multiply:
__asm__ ("mulq %q3" : "=a" (rl), "=d" (rh) : "%0" (x), "rm" (y) : "cc")
The generated code appears to choose a memory constraint for argument 3
, which would be fine if we were register starved, to avoid a spill. Obviously there's less register pressure on x86-64 than on IA32. However, the assembly snippet generated (by clang) is:
movq %rcx, -8(%rbp)
## InlineAsm Start
mulq -8(%rbp)
## InlineAsm End
Choosing a memory constraint is clearly pointless! Changing the constraint to: "r" (y)
, however (forcing a register) we get:
## InlineAsm Start
mulq %rcx
## InlineAsm End
as expected. These results are for clang / LLVM 3.2 (current Xcode release). The first question: Why would clang select the less efficient constraint in this case?
Secondly, there is the less widely used, comma-separated, multiple alternative constraint syntax:
"r,m" (y)
, which should evaluate the costs of each alternative, and choose the one that results in less copying. This appears to work, but clang simply chooses the first - as evidenced by: "m,r" (y)
I could simply drop the "m"
alternative constraints, but this doesn't express the range of possible legal operands. This brings me to the second question: Have these issues been resolved or at least acknowledged in 3.3? I've tried looking through LLVM dev archives, but I'd rather solicit some answers before unnecessarily restricting constraints further, or joining project discussions, etc.
I had a response on the cfe-dev (clang front end developers' list) from one of the developers:
LLVM currently always spills "rm" constraints in order to simplify the handling of inline asm in the backend (you can ask on llvmdev if you want details). I don't know of any plans to fix this in the near future.
So it's clearly a 'known' issue. One of the goals of clang is to correctly handle gcc's inline assembly syntax, amongst other extensions, which it does in this case - just not very efficiently. In short, this isn't a bug, per se.
Since this isn't a bug, I'm going to continue with the "r,m"
constraint syntax. I figure that this is the best compromise for now. gcc
will choose the best - presumably a register where possible - and clang
will force the use of a register by ignoring further options after the comma. If nothing else, it still preserves the semantic intent of the assembly statement, i.e., describing possible constraints, even if they are ignored.
A final note (20130715) : This particular example will not compile using the "r,m"
constraint in a single position - we would have to supply an alternative constraint match for each, e.g.,
: "=a,a" (rl), "=d,d" (rh) : "%0,0" (x), "r,m" (y)
This is required for multiple alternative constraints with GCC. But we're getting into territory where GCC has been known to exhibit bugs in the past - whether or not this is true as of 4.8.1, I don't know. Clang works without the alternatives in the other constraints, which is incompatible with GCC syntax, and must therefore be considered a bug.
If performance is critical, use "r"
, otherwise, stick with "rm"
and maybe clang will address this in the future, even as it benefits GCC.