c++ccudathrustmultiple-gpu

Initialize struct on different GPUs


I have a struct that looks like this

struct LstmLayer {
  int deviceId;
  thrust::device_vector <real_t> W;
  thrust::device_vector <real_t> gradW;

  LstmLayer() : deviceId(0) {}

  LstmLayer(int __deviceId__) : deviceId(__deviceId__) {}

  void setDevice(int __deviceId__) { deviceId = __deviceId__; }

  void init(bool initParams) {
    W.resize(4*lstmSize * 2*lstmSize);
    gradW.resize(4*lstmSize * 2*lstmSize);

    if (initParams) GPU_Random_Vector(W);
  }
}

Now I want to initialize an array of LstmLayer, with each element on a different GPU device. I do it as follows

  struct LstmLayer lstmLayers[MAX_NUM_LSTM_LAYERS];

  for (int i = 0; i < numLstmLayers; ++i) {
    CUDA_SAFE_CALL(cudaSetDevice(i));
    lstmLayers[i].setDevice(i);
    lstmLayers[i].init(true);
  }

Running this program gives the following error

terminate called after throwing an instance of 'thrust::system::system_error'
  what():  driver shutting down

Please tell me what's wrong with my code and how to do it right? Thank you in advanced.


Solution

  • The problem is you are instantiating all the vectors in the same CUDA GPU context, then trying to use them in a different context. This happens because the default constructor for each device_vector gets called when you define the array of structures. To annotate your code:

    struct LstmLayer lstmLayers[MAX_NUM_LSTM_LAYERS]; // default constructor for each device vector called here in one GPU context.
    
    for (int i = 0; i < numLstmLayers; ++i) {
        CUDA_SAFE_CALL(cudaSetDevice(i));
        lstmLayers[i].setDevice(i);
        lstmLayers[i].init(true); // Error here, you changed to a different device and called resize
    }
    

    The solution is probably to redefine the device vectors as pointers, and explicitly call their constructors in your init method. There are a lot of different ways this could be done, for example:

     struct LstmLayer {
      int deviceId;
      thrust::device_vector <real_t>  * W;
      thrust::device_vector <real_t> * gradW;
    
      LstmLayer() : deviceId(0) {}
    
      LstmLayer(int __deviceId__) : deviceId(__deviceId__) {}
    
      void setDevice(int __deviceId__) { deviceId = __deviceId__; }
    
      void init(bool initParams) {
        W = new thrust::device_vector<real_t>(4*lstmSize * 2*lstmSize);
        gradW = new thrust::device_vector<real_t>(4*lstmSize * 2*lstmSize);
    
        if (initParams) GPU_Random_Vector(W);
      }
    }
    

    [disclaimer: written in browser, never compiled, use at own risk]

    obviously you would need to defined a destructor to prevent memory leaks. There are other possibilities, I'll leave that as an exercise to the reader.