My global array contains struct {float,float}. The first thing I do to it is a 64bit CAS on one of the structs. Depending on the return value I (may) want to modify the second float. Now I have the option of either using a 32bit CAS, or a 64bit. I know (based on the return value of the first CAS) that the first float will not change value again.
Is it safe to combine 64 bit CAS and 32 bit CAS on the same 64 bit memory?
Yes, it is safe. Those atomic operations will "serialize", just like any other collection of atomics would.
Are there performance considerations to pick one or the other?
NVIDIA doesn't really provide throughput information/specification for atomic activity, it has to be determined experimentally. My expectation is that 32 bit atomics would not be slower than an equivalent number and distribution of 64-bit atomics, and it might be faster.