Is this a bug in MS CL's global optimization (/Og)?

Here is a very simplified program which reproduces the issue I faced in the real application:

#include "stdlib.h"
typedef unsigned short int shft;
struct xptr  {
    int layer;
    void *  addr;

    xptr() : layer(0), addr(NULL) {}
    xptr(int _layer, void *_addr) : layer(_layer), addr(_addr) {}

    xptr(const xptr& x) { layer = x.layer; addr = x.addr; }

/* Uncomment this to remove the bug */
/*   const xptr& operator= (const xptr& rhs) 
    {
        if (this != &rhs) {
            this->addr = rhs.addr;
            this->layer = rhs.layer;
        }
        return *this;
    }*/
};


struct n_dsc {
    xptr    pdsc;      
    xptr    ldsc;      
    xptr    rdsc;      
};


void dm_string_value_traverse(xptr node)
{    
    xptr p = node;    
    while (p.addr != 0)
    {        
        p = ((n_dsc*)p.addr)->rdsc;    
    }
}

int main()
{
    n_dsc n1, n2;

    n1.rdsc = xptr(0, &n2);
    xptr n = xptr(0, &n1);

    dm_string_value_traverse(n);
}

I absolutely could not understand why CL 2008 (with /Og /Zi /W4) generates the following assembly for the "dm_string_value_traverse" function:

    while (p.addr != 0)
004ABAC5  mov         eax,dword ptr [esp+10h] 
004ABAC9  add         esp,0Ch 
004ABACC  test        eax,eax 
004ABACE  je          dm_string_value_traverse+5Fh (4ABADFh) 
    {
        p = ((n_dsc*)p.addr)->rdsc;
004ABAD0  mov         ecx,dword ptr [eax+1Ch] 
004ABAD3  mov         dword ptr [esp],ecx 
004ABAD6  mov         eax,dword ptr [eax+20h] 
004ABAD9  mov         dword ptr [esp+4],eax 
004ABADD  jmp         dm_string_value_traverse+50h (4ABAD0h) ;NOTE THIS JMP TO THE ASSIGNMENT
    }

}

Note:

Condition (p.addr != 0) is checked only once! See 004ABADD jumps unconditionally to the 004ABAD0 (assignment).
Without /Og compiler generates right code.
If you uncomment copy constructor compiler will generate good code also.

Is it possible to understand why this is going on? Is there a way to workaround this?

Solution

This looks like an overzealous optimizer. I can confirm the behavior you describe with Visual Studio 2008 SP1, /Og, and /Fa for assembly output. This wouldn't be the first time for VC: try google visual c++ "/Og" site:support.microsoft.com.

One workaround is to iterate with an xptr pointer, instead of an xptr value. This also has the beneficial side-effect of reducing the number of bytes copied on each iteration from 8 (xptr value) to 4 (xptr pointer).

void dm_string_value_traverse(xptr node)
{    
    xptr *p = &node;    
    while (p->addr != 0)
    {        
        p = &(((n_dsc*)(p->addr))->rdsc);    
    }
}

The resulting assembly code, with /Og, now looks like this. I can't map the assembly exactly to the source code, as the comparison (p->addr != 0) now happens in two places. It is clear, however, that the loop now includes a test for its end condition.

; prolog
      push  ebp
      mov   ebp, esp
; while (p->addr != 0)
      mov   eax, DWORD PTR _node$[ebp+4]
      test  eax, eax
      je    SHORT $LN1@dm_string_
      npad  6
; p = &(((n_dsc*)p->addr)->rdsc);
$LL2@dm_string_:
      mov   eax, DWORD PTR [eax+20]
      test  eax, eax
      jne   SHORT $LL2@dm_string_
$LN1@dm_string_:
; epilog
      pop   ebp
      ret   0

Given how intractable this class of bug is, there may be similar issues lurking invisible in the code. For the portion of the code that deals with xptr's and n_dsc's, you might want to consider going without /Og, or unit-testing it all around.