pythonmachine-learninggpugoogle-colaboratoryturi-create

Training with GPU an object detection model on colab with Turicreate


It's days that I'm trying to train an object detection model on Google Colab using GPU with TuriCreate.

According to the TuriCreate's repository, to use gpu during training you must follow these instructions:

https://github.com/apple/turicreate/blob/main/LinuxGPU.md

However, everytime I start the training, the shell produces this output before starting the training:

"Using CPU to create model."

My colab is structured as it follows:

Set up cuda environment

!wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
!sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
!sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
!sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
!sudo apt-get update

!wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb

!sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
!sudo apt-get update

!wget https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/libnvinfer7_7.1.3-1+cuda11.0_amd64.deb
!sudo apt install ./libnvinfer7_7.1.3-1+cuda11.0_amd64.deb
!sudo apt-get update

# Install development and runtime libraries (~4GB)
!sudo apt-get install --no-install-recommends \
    cuda-11-0 \
    libcudnn8=8.0.4.30-1+cuda11.0  \
    libcudnn8-dev=8.0.4.30-1+cuda11.0

# Install TensorRT. Requires that libcudnn8 is installed above.
!sudo apt-get install -y --no-install-recommends libnvinfer7=7.1.3-1+cuda11.0 \
    libnvinfer-dev=7.1.3-1+cuda11.0 \
    libnvinfer-plugin7=7.1.3-1+cuda11.0

tc.config.set_num_gpus(-1)
model = tc.object_detector.create(train_sf)
scores = model.evaluate(valid_sf)
print(scores['mean_average_precision'])
model.export_coreml('model.mlmodel')

Check installation with nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   33C    P8    27W / 149W |      0MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Dependency installation

!pip install turicreate
!pip uninstall -y tensorflow
!pip install tensorflow-gpu 

Set up bash environment variables

!echo export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH >> ~/.bashrc

Training

tc.config.set_num_gpus(-1)
model = tc.object_detector.create(train_sf)
scores = model.evaluate(valid_sf)
print(scores['mean_average_precision'])
model.export_coreml('model.mlmodel')

This is the output

TuriCreate currently only supports using one GPU. Setting 'num_gpus' to 1.
Using 'image' as feature column
Using 'annotations' as annotations column

Using CPU to create model.

Setting 'batch_size' to 32

I can't understand what I'm missing.


Solution

  • I managed to solve this: the problem is due to the version of tensorflow pre-installed on the colab machine.

    !pip uninstall -y tensorflow
    !pip uninstall -y tensorflow-gpu
    !pip install turicreate
    !pip install tensorflow==2.4.0