I've read caffe2 tutorials and tried pre-trained models. I knew caffe2 will leverge GPU to run the model/net. But the input data seems always be given from CPU(ie. Host) memory. For example, in Loading Pre-Trained Models, after model is loaded, we can predict an image by
result = p.run([img])
However, image "img" should be read in CPU scope. What I look for is a framework that can pipline the images (which is decoded from a video and still resides in GPU memory) directly to the prediction model, instead of copying it from GPU to CPU scope, and then transfering to GPU again to predict result. Is Caffe or Caffe2 provides such functions or interfaces for python or C++? Or should I need to patch Caffe to do so? Thanks at all.
Here is my solution:
I'd found in tensor.h
, function ShareExternalPointer()
can exactly do what I want.
Feed gpu data this way,
pInputTensor->ShareExternalPointer(pGpuInput, InputSize);
then run the predict net through
pPredictNet->Run();
where pInputTensor
is the entrance tensor for the predict net pPredictNet
I don't think you can do it in caffe with python interface.
But I think that it can be accomplished using the c++: In c++ you have access to the Blob
's mutable_gpu_data()
. You may write code that run on device and "fill" the input Blob's mutable_gpu_data()
directly from gpu. Once you made this update, caffe should be able to continue its net->forward()
from there.
UPDATE
On Sep 19th, 2017 PR #5904 was merged into master. This PR exposes GPU pointers of blobs via the python interface.
You may access blob._gpu_data_ptr
and blob._gpu_diff_ptr
directly from python at your own risk.