cudaunified-memory

cudaMallocManaged() issues on Nvidia p100


I am trying to compile and run the following code on an Nvidia P100. I'm running CentOS 6.9, Driver version 396.37 and CUDA-9.2. It appears that these driver/cuda versions are compatible.

#include <stdio.h>
#include <cuda_runtime_api.h>
int main(int argc, char *argv[])
{
    // Declare variables
    int * dimA = NULL; //{2,3};
    cudaMallocManaged(&dimA, 2 * sizeof(float));
    dimA[0] = 2;
    dimA[1] = 3;
    cudaDeviceSynchronize();
    printf("The End\n");

    return 0;
}

It fails with a segmentation fault. When I compile with nvcc -g -G src/get_p100_to_work.cu and run the core file (cuda-gdb ./a.out core.277512), I get

Reading symbols from ./a.out...done.
[New LWP 277512]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./a.out'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000000000040317d in main (argc=1, argv=0x7fff585da548) at src/get_p100_to_work.cu:71
71      dimA[0] = 2;
(cuda-gdb) bt full
#0  0x000000000040317d in main (argc=1, argv=0x7fff585da548) at src/get_p100_to_work.cu:71
        dimA = 0x0
(cuda-gdb)

When I run this code on an NVidia K40, the code runs without error.

QUESTION :

How do I get my code to run on the P100? It seems from this tutorial, this code should run.


Solution

  • Previously, I had cloned an image of a GPU node with a 2 K40's in it. I then put that image on a node with 2 - P100's in it. I suspect that when installing the driver on the K40 node, there is a configuration specific to the graphics cards on the machine (which is makes sense). This configuration was not compatible with the P100. Since the driver on the P100 machine was basically corrupted, this would explain why my code failed so cataclysmically.

    Solution : I ended up having to reinstall the driver and now it works.