I have a struct that looks like this
struct LstmLayer {
int deviceId;
thrust::device_vector <real_t> W;
thrust::device_vector <real_t> gradW;
LstmLayer() : deviceId(0) {}
LstmLayer(int __deviceId__) : deviceId(__deviceId__) {}
void setDevice(int __deviceId__) { deviceId = __deviceId__; }
void init(bool initParams) {
W.resize(4*lstmSize * 2*lstmSize);
gradW.resize(4*lstmSize * 2*lstmSize);
if (initParams) GPU_Random_Vector(W);
}
}
Now I want to initialize an array of LstmLayer
, with each element on a different GPU device. I do it as follows
struct LstmLayer lstmLayers[MAX_NUM_LSTM_LAYERS];
for (int i = 0; i < numLstmLayers; ++i) {
CUDA_SAFE_CALL(cudaSetDevice(i));
lstmLayers[i].setDevice(i);
lstmLayers[i].init(true);
}
Running this program gives the following error
terminate called after throwing an instance of 'thrust::system::system_error'
what(): driver shutting down
Please tell me what's wrong with my code and how to do it right? Thank you in advanced.
The problem is you are instantiating all the vectors in the same CUDA GPU context, then trying to use them in a different context. This happens because the default constructor for each device_vector
gets called when you define the array of structures. To annotate your code:
struct LstmLayer lstmLayers[MAX_NUM_LSTM_LAYERS]; // default constructor for each device vector called here in one GPU context.
for (int i = 0; i < numLstmLayers; ++i) {
CUDA_SAFE_CALL(cudaSetDevice(i));
lstmLayers[i].setDevice(i);
lstmLayers[i].init(true); // Error here, you changed to a different device and called resize
}
The solution is probably to redefine the device vectors as pointers, and explicitly call their constructors in your init
method. There are a lot of different ways this could be done, for example:
struct LstmLayer {
int deviceId;
thrust::device_vector <real_t> * W;
thrust::device_vector <real_t> * gradW;
LstmLayer() : deviceId(0) {}
LstmLayer(int __deviceId__) : deviceId(__deviceId__) {}
void setDevice(int __deviceId__) { deviceId = __deviceId__; }
void init(bool initParams) {
W = new thrust::device_vector<real_t>(4*lstmSize * 2*lstmSize);
gradW = new thrust::device_vector<real_t>(4*lstmSize * 2*lstmSize);
if (initParams) GPU_Random_Vector(W);
}
}
[disclaimer: written in browser, never compiled, use at own risk]
obviously you would need to defined a destructor to prevent memory leaks. There are other possibilities, I'll leave that as an exercise to the reader.