python tensorflow google-colaboratory transfer-learning mask-rcnn

Mask R-CNN Load_weights function does not work in Google Colab with tensorflow.compat.v1

I want to train a Mask R-CNN model in Google Colab using transfer learning. For that, I'm utilizing the coco.h5 dataset. I installed Mask R-CNN with !pip install mrcnn-colab. I noticed that the following code does not load the weights: model.load_weights(COCO_MODEL_PATH, by_name=True). The names are right and by_name=False results in the same problem. I can confirm this by checking with the following lines:

from mrcnn import visualize
visualize.display_weight_stats(model)

This displays the same values both before and after loading (I just show the first 10 layers):

I believe I've found the solution to this problem. It involves the following lines of code:

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
tf.compat.v1.get_default_graph()

This solution is often recommended because Mask R-CNN actually requires TensorFlow 1.X, whereas the latest TensorFlow version is 2.X, and Colab doesn't support TensorFlow 1.X. Therefore, I used this solution, which unfortunately results in the load_weights function not working. I managed to adjust my code so that import tensorflow.compat.v1 is not necessary and used the modified model.py and utils.py code from https://github.com/ahmedfgad/Mask-RCNN-TF2/tree/master, which requires a Python version lower than 3.10 (the standard in Colab).

For the Python downgrade, I used the following commands:

!apt-get update -y
!update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.7 1
!update-alternatives --config python3
!apt install python3-pip
!apt install python3.7-distutils

This resulted in the installation of another Python version, but I am unable to use it in Colab. Colab always defaults to using Python 3.10. This can be confirmed by running the following code:

import sys
print("User Current Version:-", sys.version)

which results in the following output:

User Current Version:- 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0]

Therefore, I created a new runtime in Colab with Python 3.7.6 as follows:

!wget -O mini.sh https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.2-Linux-x86_64.sh
!chmod +x mini.sh
!bash ./mini.sh -b -f -p /usr/local
!conda install -q -y jupyter
!conda install -q -y google-colab -c conda-forge
!python -m ipykernel install --name "py37" --user

After switching to this runtime, I upgraded the Python version to 3.7.11, which I actually needed:

!conda install python=3.7.11 -y

With these adjustments, I can load the weights; however, I am limited to using the CPU. The reason for this limitation is that the CUDA version of Colab is not compatible with this Python version, and I was unable to achieve a downgrade. Additionally, the new runtime solution often necessitates frequent restart runtime actions, as it tends to freeze when I click the run button. So, regarding this problem, I have the following questions:

How can I downgrade the CUDA version to 10.1? I've already tried various approaches, but I always come to the conclusion that it's not possible in Colab.
Is it possible to force Colab to use a previously installed Python version?
Is there an alternative to the import tensorflow.compat.v1 as tf code that allows loading the weights?

Solution

You can use this implementation which is built on top of the original Mask R-CNN repo to support TF2. This repository allows to train and test the Mask R-CNN model with TensorFlow 2.14.0, and Python 3.10.12.

You can also use it on Google Colab (current colab environment also uses Python 3.10.12 and TF 2.14.0) and it's working without any issues on GPU. Please make sure your runtime is using the GPU:

and then follow these exact steps:

# Clone the repo
!git clone https://github.com/z-mahmud22/Mask-RCNN_TF2.14.0.git maskrcnn  

# Change the runtime directory to the cloned repo
import os 
os.chdir('/content/maskrcnn/')

# Download pre-trained weights
!wget https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5

And then use this snippet to load weights into the Mask R-CNN model:

import mrcnn
import mrcnn.config
import mrcnn.model

# create a config file

CLASS_NAMES = ['BG', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']

class SimpleConfig(mrcnn.config.Config):
    # Give the configuration a recognizable name
    NAME = "coco_inference"
    
    # set the number of GPUs to use along with the number of images per GPU
    GPU_COUNT = 1
    IMAGES_PER_GPU = 1

    # Number of classes = number of classes + 1 (+1 for the background). The background class is named BG
    NUM_CLASSES = len(CLASS_NAMES)

# Initialize the Mask R-CNN model for inference and then load the weights.
# This step builds the Keras model architecture.
model = mrcnn.model.MaskRCNN(mode="inference", 
                             config=SimpleConfig(),
                             model_dir=os.getcwd())

# Load the weights into the model
model.load_weights(filepath="mask_rcnn_coco.h5", 
                   by_name=True)

If you could properly follow all those steps, you should be able to load the pre-trained weights without any issue and verify the change in weights with:

from mrcnn import visualize
visualize.display_weight_stats(model)

which prints out:

# Showing the first 10 layers as done in the question
WEIGHT NAME SHAPE   MIN MAX STD
conv1/kernel:0  (7, 7, 3, 64)   -0.8616 +0.8451 +0.1315
conv1/bias:0    (64,)   -0.0002 +0.0004 +0.0001
bn_conv1/gamma:0    (64,)   +0.0835 +2.6411 +0.5091
bn_conv1/beta:0 (64,)   -2.3931 +5.3610 +1.9781
bn_conv1/moving_mean:0  (64,)   -173.0470   +116.3013   +44.5654
bn_conv1/moving_variance:0*** Overflow? (64,)   +0.0000 +146335.3594    +21847.9668
res2a_branch2a/kernel:0 (1, 1, 64, 64)  -0.6574 +0.3179 +0.0764
res2a_branch2a/bias:0   (64,)   -0.0022 +0.0082 +0.0018
bn2a_branch2a/gamma:0   (64,)   +0.2169 +1.8489 +0.4116
bn2a_branch2a/beta:0    (64,)   -2.1180 +3.7332 +1.1786

Here's a snippet to visualize the predictions from the pre-trained Mask R-CNN:

import cv2
import mrcnn.visualize
# load the input image, convert it from BGR to RGB channel
image = cv2.imread("test.jpg")
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# Perform a forward pass of the network to obtain the results
r = model.detect([image], verbose=0)

# Get the results for the first image.
r = r[0]

# Visualize the detected objects.
mrcnn.visualize.display_instances(image=image, 
                                  boxes=r['rois'], 
                                  masks=r['masks'], 
                                  class_ids=r['class_ids'], 
                                  class_names=CLASS_NAMES, 
                                  scores=r['scores'])

which yields: