It's days that I'm trying to train an object detection model on Google Colab using GPU with TuriCreate.
According to the TuriCreate's repository, to use gpu during training you must follow these instructions:
https://github.com/apple/turicreate/blob/main/LinuxGPU.md
However, everytime I start the training, the shell produces this output before starting the training:
"Using CPU to create model."
My colab is structured as it follows:
Set up cuda environment
!wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
!sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
!sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
!sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
!sudo apt-get update
!wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
!sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
!sudo apt-get update
!wget https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/libnvinfer7_7.1.3-1+cuda11.0_amd64.deb
!sudo apt install ./libnvinfer7_7.1.3-1+cuda11.0_amd64.deb
!sudo apt-get update
# Install development and runtime libraries (~4GB)
!sudo apt-get install --no-install-recommends \
cuda-11-0 \
libcudnn8=8.0.4.30-1+cuda11.0 \
libcudnn8-dev=8.0.4.30-1+cuda11.0
# Install TensorRT. Requires that libcudnn8 is installed above.
!sudo apt-get install -y --no-install-recommends libnvinfer7=7.1.3-1+cuda11.0 \
libnvinfer-dev=7.1.3-1+cuda11.0 \
libnvinfer-plugin7=7.1.3-1+cuda11.0
tc.config.set_num_gpus(-1)
model = tc.object_detector.create(train_sf)
scores = model.evaluate(valid_sf)
print(scores['mean_average_precision'])
model.export_coreml('model.mlmodel')
Check installation with nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |
| N/A 33C P8 27W / 149W | 0MiB / 11441MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Dependency installation
!pip install turicreate
!pip uninstall -y tensorflow
!pip install tensorflow-gpu
Set up bash environment variables
!echo export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH >> ~/.bashrc
Training
tc.config.set_num_gpus(-1)
model = tc.object_detector.create(train_sf)
scores = model.evaluate(valid_sf)
print(scores['mean_average_precision'])
model.export_coreml('model.mlmodel')
This is the output
TuriCreate currently only supports using one GPU. Setting 'num_gpus' to 1.
Using 'image' as feature column
Using 'annotations' as annotations column
Using CPU to create model.
Setting 'batch_size' to 32
I can't understand what I'm missing.
I managed to solve this: the problem is due to the version of tensorflow pre-installed on the colab machine.
!pip uninstall -y tensorflow
!pip uninstall -y tensorflow-gpu
!pip install turicreate
!pip install tensorflow==2.4.0