This is the code in question:
struct Cell
{
Cell* U;
Cell* D;
void Detach();
};
void Cell::Detach()
{
U->D = D;
D->U = U;
}
clang-14 -O3 produces:
mov rax, qword ptr [rdi] <-- rax = U
mov rcx, qword ptr [rdi + 8] <-- rcx = D
mov qword ptr [rax + 8], rcx <-- U->D = D
mov rcx, qword ptr [rdi + 8] <-- this queries the D field again
mov qword ptr [rcx], rax <-- D->U = U
gcc 11.2 -O3 produces almost the same, but leaves out one mov
:
mov rdx, QWORD PTR [rdi]
mov rax, QWORD PTR [rdi+8]
mov QWORD PTR [rdx+8], rax
mov QWORD PTR [rax], rdx
Clang reads the D field twice, while GCC reads it only once and re-uses it. Apparently GCC is not afraid of the first assignment changing anything that has an impact on the second assignment. I'm trying to understand if/when this is allowed.
Checking correctness gets a bit complicated when U or D point at themselves, each other and/or the same target.
My understanding is that the shorter code of GCC is correct if it is guaranteed that the pointers point at the beginning of a Cell (never inside it), regardless of which Cell it is.
Following this line of thought further, this is the case when a) Cells are always aligned to their size, and b) no custom manipulation of such a pointer occurs (referencing and arithmetic are fine). I suspect case a) is guaranteed by the compiler, and case b) would require invoking undefined behavior of some sort, and as such can be ignored. This would explain why GCC allows itself this optimization.
Is my reasoning correct? If so, why does clang not make the same optimization?
Update: Fixed in clang 18
There are many potential optimizations in C and C++ that are usually safe, but aren't quite sound. If one regards the ->
operator as being usable to build a standard-layout object without having to use placement new on it first (an abstraction model that is relied upon by a lot of code, whether or not the Standard mandates support), removing the if (mode)
in the following C and C++ funcitons would be such an optimization.
C version:
struct s { int x,y; }; /* Assume int is 4 bytes, and struct is 8 */
void test(struct s *p1, struct s *p2, int mode)
{
p1->y = 1;
p2->x = 2;
if (mode)
p1->y = 1;
}
C++ version:
#include <new>
struct s { int x,y; };
void test(void *vp1, void *vp2, int mode)
{
if (1)
{
struct s* p1 = new (vp1) struct s;
p1->x = 1;
}
if (1)
{
struct s* p2 = new (vp2) struct s;
p2->y = 2;
}
if (mode)
{
struct s* p3 = new (vp1) struct s;
p3->x = 1;
}
}
The optimization would be correct unless the address in p2 is four bytes higher than p1. Under the "traditional" abstraction model used in C or C++, if the address of p1
happens to be 0x1000
and that of p2
happens to be 0x1004
, the first assignment would cause addresses 0x1000-0x1007 to hold a struct s
, if it didn't already, whose second member (at address 0x1004) would equal 1. The second assignment, by overwriting that object, would end its lifetime and cause addresses 0x1004 to 0x100B to hold a struct s
whose first member would equal 2. The third assignment, if executed, would end the lifetime of that second object and re-create the first.
If the third assignment is executed, there would be an object at address 0x1000 whose second field (at address 0x1004) would hold the readable value 1. If the assignment is skipped, there would be an object at address 0x1004 whose first field would hold the value 2. Behavior would be defined in both cases, and a compiler that didn't know which case would apply would have to accommodate both of them by making the value at 0x1004 depend upon mode
.
As it happens, the authors of clang do not seem to have provided for that corner case, and thus omit the conditional check. While I think the Standard should use an abstraction model that would allow such optimization, while also supporting the common structure-creation pattern in situations that don't involve weird aliasing corner cases, I don't see any way of interpreting the Standard that would allow for such optimization without allowing compilers to arbitrarily break a large amount of existing code.
I don't think there's any general way of knowing when a decision by gcc or clang not to impose a particular optimization represents a recognition of potential corner cases where the optimization would be incorrect, and an inability to prove that none of them apply, and when it simply represents an oversight which may be "corrected" to as to replace correct behavior with an unsound optimization.