ioscoremlcoremltoolsmlmodel

How to estimate a CoreML model's maximal runtime footprint (in megabytes)


Let's say I have a network model made in TensorFlow/Keras/Caffe etc. I can use CoreML Converters API to get a CoreML model file (.mlmodel) from it.

Now, as I have a .mlmodel file, and know input shape and output shape, how can a maximum RAM footprint be estimated? I know that a model сan have a lot of layers, their size can be much bigger than input/output shape.

So the questions are:

  1. Can be a maximal mlmodel memory footprint be known with some formula/API, without compiling and running an app?
  2. Is a maximal footprint closer to a memory size of the biggest intermediate layer, or is it closer to a sum of the all layer's sizes?

Any advice is appreciated. As I am new to CoreML, you may give any feedback and I'll try to improve the question if needed.


Solution

  • IMHO, whatever formula you come up with at the end of the day must be based on the number of trainable parameters of the network.

    For classifying networks it can be found by iterating or the existing API can be used.

    In keras.

    import keras.applications.resnet50 as resnet
    
    model =resnet.ResNet50(include_top=True, weights=None, input_tensor=None, input_shape=None, pooling=None, classes=2)
    print model.summary()
    
    Total params: 23,591,810
    Trainable params: 23,538,690
    Non-trainable params: 53,120
    

    Pytorch:

    def count_parameters(model):
        return sum(p.numel() for p in model.parameters() if p.requires_grad)
    

    For the detectors, you probably need to do the same for all the important parts of the network, including the backbone, rpn, etc., whatever your network consists of.

    The second important parameter is the precision of the network. You must be heard about quantization. It changes the precision of floats for all or some layers and can be static (when the network is trained in desired precision and calibrated) or dynamic when the network is converted after the training. The simplest dynamic quantization replaces floats to some kind of ints on linear layers. Maskrcnn in pytorch results in 30% smaller file size and a substantial reduction in memory consumption with the same number of trainable parameters.

    So the final equation is like size = number_of_trainable_parameters * precision * X, where X is some factor you have to find out for your particular network and coreml specifics )