cudagpgpunvidia

Many CUDA examples fail


After installing fresh CUDA 4.0 drivers and SDK, many SDK tests fail (e.g. fastWalshTransform, matrixMul, reduction). This is the ./deviceQuery:

Device 0: "GeForce GTX 570"
  CUDA Driver Version / Runtime Version          4.0 / 4.0
  CUDA Capability Major/Minor version number:    2.0
  Total amount of global memory:                 1279 MBytes (1341325312 bytes)
  (15) Multiprocessors x (32) CUDA Cores/MP:     480 CUDA Cores
  GPU Clock Speed:                               1.57 GHz
  Memory Clock rate:                             2100.00 Mhz
  Memory Bus Width:                              320-bit
  L2 Cache Size:                                 655360 bytes
  Max Texture Dimension Size (x,y,z)             1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048)
  Max Layered Texture Size (dim) x layers        1D=(16384) x 2048, 2D=(16384,16384) x 2048
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per block:           1024
  Maximum sizes of each dimension of a block:    1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 65535
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and execution:                 Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Concurrent kernel execution:                   Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support enabled:                No
  Device is using TCC driver mode:               No
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           4 / 0

E.g. output of reduction is:

=> FAILED.

Solution: It was (and still is) a hardware problem (driver updates don't solve the problem). Maybe some memory issue but quite common. We have several NVIDIA cards showing that issue (even Tesla!). The only solution we have found so far is to restart the machine or to increase the voltage a little bit.


Solution

  • It was (and still is) a hardware problem (driver updates don't solve the problem). Maybe some memory issue but quite common. We have several NVIDIA cards showing that issue (even Tesla!). The only solution we have found so far is to restart the machine or to increase the voltage a little bit.