I've installed pycuda on a machine featuring a TESLA C2075. I'm running on Ubuntu 14.04 with the CUDA-6.0 compiler installed.
Using python 2.7.9 (via the anaconda distribution) and numpy 1.9.0, I have installed pycuda 2014.1 from the ZIP file that Andreas Kloeckner provides on his website. (http://mathema.tician.de/software/pycuda/)
Running the tests provided by that ZIP file goes all well except for the test_cumath.py
file. I receive the following error:
E AssertionError: (2.3841858e-06, 'cosh', <type 'numpy.complex64'>)`
E assert <built-in method all of numpy.bool_ object at 0x7f00747f3880>()`
E + where <built-in method all of numpy.bool_ object at 0x7f00747f3880> = 2.3841858e-06 <= 2e-06.all
test_cumath.py:54: AssertionError`
===== 1 failed, 27 passed in 12.57 seconds =====
Does anyone have a suggestion where this discrepancy between the GPU and CPU result for cosh comes from? Being just slightly over the tolerance of 2e-6 with that measured value of 2.38e-6 looks a bit weird to me. Especially, since the other tests succeed...?
In the GPGPU/CUDA community it is indeed known that different hardware platforms and CUDA library versions might yield different results when using the same API. The differences are always small. So, there is some heterogeneity across platforms.
Indeed, this makes it tedious to write tests based on numerical results. The classification of right and wrong becomes less sharp and one must answer "what is good enough?". One might consider this crazy and in many cases problematic, or even faulty. I think this should not be disputed here.
What do you think, where did the 2e-6
tolerance come from in the first place? I'd say someone tried to find a trade-off between how much of a variance he/she thought is sufficiently correct and how much of a variance he/she expected, practically. In the CPU world 2e-6
is already large. Hence, here someone chose a large tolerance in order to account for an expected degree of heterogeneity among GPU platforms.
In this case, practically, it probably means that the tolerance has not been chosen to reflect the real-world heterogeneity of GPU platforms.
Having said this, the GPGPU community is also aware of the fact that an incredible amount of GPU cards is flaky (broken, basically). Before running serious applications, GPU cards must be exhaustively tested. Especially, a GPU card should produce reproducible results. Fluctuations are an indicator for a broken card. The Teslas usually are not affected as much as consumer cards, but even there we have seen it. Do you have a second GPU card of the same type? Does it produce the same results?
Either you identify your GPU card as "broken" (by comparison with other cards of the same type) or you should submit a bug report to PyCUDA and tell them about the tolerance that does not suffice.