c++windowsming

_BitScanForward64 returns wrong answer in c++.exe (rubenvb-4.7.2-release)


Long time MSVC user, new to gcc (so bear with me).

I am using the rubenvb version of c++ (see version in subject, yes I'm building for 64bit) on Windows 7 and I'm having a problem using _BitScanForward64. Some sample code looks like this:

int __cdecl main(int argc, char* argv[])
{
    DWORD d = (DWORD)atoi(argv[1]);

    DWORD ix, ix2;
    ix2 = _BitScanForward64(&ix, d);
    printf("bsf %u %u\n", ix, ix2);
}

I am compiling with:

"C:\Program Files\gcc2\mingw64\bin\c++.exe" -o iTot.exe -mno-ms-bitfields -march=native -momit-leaf-frame-pointer -mwin32 -Os -fomit-frame-pointer -m64 -msse4 -mpopcnt -D WINDOWS main.cpp

When I run iTot.exe using the parameter 8, I expected that _BitScanForward64 would set ix to 3. That's what MSVC does. However, ix is 0 and ix2 is 1.

Also, looking at the assembler, I see:

bsfq QWORD PTR 44[rsp],rax   # MEM[(volatile LONG64 *)&ix], Mask

Under the circumstances, why does gcc force a memory write+read here?

So, a few questions:

  1. Is _BitScanForward64 somehow supposed to be called differently under gcc? If I'm just calling it wrong, that would be good to know (although the incompatibility with MSVC would be a pain).
  2. Why does the _BitScanForward64 intrinsic force a memory write?
  3. Staring at the assembler output from -S, I couldn't see anything wrong with the code being generated. However, using objdump.exe -d -Mintel, I see that rather than using the asm code above (which seems like it would work), it actually produced the reverse:

    bsf rax,QWORD PTR [rsp+0x2c]

WTF? Why is -S lying to me?

Like I said, I'm new to gcc, so if I'm just doing something dumb, be gentle with me. Thanks.


Solution

  • Ok, I think I've answered my own questions. Thanks to Joachim PileBorg who made me look at where the definition was, and Alexey Frunze who pointed out that the params can't be backward.

    While I'm too new to gcc to say this authoritatively, I believe the definition for _BitScanForward64 in winnt.h is very wrong.

    The current definition:

    __CRT_INLINE BOOLEAN _BitScanForward64(DWORD *Index,DWORD64 Mask) {
      __asm__ __volatile__("bsfq %1,%0" : "=r" (Mask),"=m" ((*(volatile LONG64 *)Index)));
      return Mask!=0;
    }
    

    My definition:

    __CRT_INLINE BOOLEAN BSF(DWORD *Index,DWORD64 Mask) {
      LONG64 t;
      __asm__ ("bsfq %0,%1" : "=r" (Mask),"=r" (t));
      *Index = t;
      return Mask!=0;
    }
    

    Note the removal of the (unneeded) volatile, the reversal of the parameters to bsfq, the change from =m to =r, etc. Basically, it appears this definition is as wrong as it could be and still compile.

    I'm guessing the person who wrote this looked at the prototype for BitScanForward64 and "knew" that one of the parameters had to be memory, and since the only one that can be memory for BSF is p2, that's what they did. As written, the code will read the unwritten contents of p2 and scan it for bits. It compiles, but produces the wrong answer.

    So, to take my questions in order:

    1. No, I wasn't calling it incorrectly. The definition in winnt.h is just wrong. In fact, there's probably a bunch in that file that have a similar problem (_BitScanForward, _BitScanForward64, _BitScanReverse, _BitScanReverse64, etc).
    2. It forces a memory write because the code in winnt.h was wrong. My proposed change does not force any memory accesses.
    3. -S is writing the output file incorrectly (objdump has it right). Using my definition above produces:

      call    atoi
      lea rcx, .LC0[rip]
      /APP
      # 7 "m.cpp" 1
      bsfq rax,rdx
      /NO_APP
      call    printf
      

    And this isn't what is actually in the executable file. The actual executable file contains the (correct) definition:

    bsfq rdx,rax
    

    While I'm not excited about modifying system header files, it appears that's going to be my answer here. If anyone knows how/where to report this problem so it gets fixed (as I mentioned, I'm using reubenvb), I could report these 2 issues so (hopefully) this gets fixed for everyone.