deep-learningpytorchgpuopencl

Can you accelerate torch DL training on anything other than "cuda" like "hip" or "OpenCL"?


I've noticed that torch.device can accept a range of arguments, precisely cpu, cuda, mkldnn, opengl, opencl, ideep, hip, msnpu.

However, when training deep learning models, I've only ever seen cuda or cpu being used. Very often the code looks something like this

if torch.cuda.is_available():
    device = torch.device("cuda")
else:
    device = torch.device("cpu")

I've never seen any of the others being used, and was wondering if they can be used and how. The latest MacBooks with an AMD graphic card I believe should be able to use "hip", but is that true? And will the training speed be similar to that of using one CUDA GPU? If not, what is the point in torch.device accepting so many options if they cannot actually be used?


Solution

  • If you want to use a GPU for deep learning there is selection between CUDA and CUDA...

    More broad answer, yes there is AMD's hip and some OpenCL implementation:

    1. The is hip by AMD - CUDA like interface with ports of PyTorch, hipCaffe, TensorFlow, but
      • AMD's hip/rocm is supported only on Linux - no Windows or Mac OS support by rocm provided
      • Even if you want to use Linux with AMD GPU + ROCM, you have to stick to GCN desrete devices (i.e. cards like rx 580, Vega 56/64 or Radeon VII), there is no hip/rocm support for RDNA devices (a year since a release) and it does not look to be any time soon, APUs aren't supported as well by hip.
    2. Only one popular frameworks that supports OpenCL are Caffe and Keras+PlaidML. But
      • Caffe's issues:
        • Caffe seems have not being actively developed any more and somewhat outdated by today's standard
        • Performance of Caffe OpenCL implementation is about 1/2 of what is provided by nVidia's cuDNN and AMD's MIOpen, but it works quite OK and I used it in many cases.
        • Latest version had even grater performance hit https://github.com/BVLC/caffe/issues/6585 but at least you can run a version that works several changes behind
        • Also Caffe/OpenCL works there are still some bugs I fixed manually for OpenCL over AMD. https://github.com/BVLC/caffe/issues/6239
      • Keras/Plaid-ML
        • Keras on its own is much weaker framework in terms of ability to access lower level functionality
        • PlaidML performance is still 1/2 - to 1/3 of optimized NVidia's cuDNN & AMD's MIOpen-ROCM - and slower that caffe OpenCL in the tests I did
        • The future of non-TF backends for keras is not clear since 2.4 it requires TF...

    Bottom line:

    1. If you have GCN discrete AMD GPU and you run Linux you can use ROCM+Hip. Yet it isn't as stable as CUDA
    2. You can try OpenCL Caffe or Keras-PlaidML - it maybe slower and not as optimal as other solutions but have higher chances of making it work.

    Edit 2021-09-14: there is a new project dlprimitives:

    https://github.com/artyom-beilis/dlprimitives

    that has better performance than both Caffe-OpenCL and Keras - it is ~75% performance for training in comparison to Keras/TF2, however it is under early development and has at this point much more limited set of layers that Caffe/Keras-PlaidML

    The connection to pytorch is work in progress with some initial results: https://github.com/artyom-beilis/pytorch_dlprim

    Disclaimer: I'm the author of this project