OpenCL code behavior is different for AMD vs NVIDIA cards

I have a constant at the top of my code...

__constant uint uintmaxx =  (uint)(  (((ulong)1)<<32) - 1 );

It compiles fine on AMD and NVIDIA OpenCL compilers... then executes.

(correct) on ATI cards, returns... 4294967295 or (all 32 bits = 1)

(wrong) on NVIDIA cards, returns... 2147483648 or (only 32'nd bit = 1)

I also tried -1 + 1<<32 and it worked on ATI but not NVIDIA.

What gives? Am I just missing something?

While I'm on the topic of OpenCL compiler differences, does anyone know a good resource that lists the compiler differences between AMD and NVIDIA?

Solution

OpenCL conveniently provides that for you already. You can use the predefined UINT_MAX in your kernel code and the implementation will guarantee that it holds the correct value.

However there is also nothing wrong in the method you use. The spec guarantees that uint is 32bits and ulong 64bits, ints are twos complement and everything that is not explicitly mentioned works exactly as is written in C99 spec.

Even just this should work and give you the correct result: uint uintmaxx = -1;

It seems that NVidia just has a broken compiler, if not I really hope I'll be corrected on the issue. The really odd part there is that how on earth the 32nd bit is 1? Shift to left by 32 moves the original bit to the 33rd place. So what on earth places a bit in the 32nd spot? The only thing I got in my mind is that they don't respect operator ordering at all and transform the formula into (ulong)1 << (32-1) or something like that.

You probably should file a bug report. But to be frank considering that they hate OpenCL as much as Microsoft hates OpenGL, if not even more, I wouldn't really anticipate fast response times.