amazon-web-servicesamazon-sagemakeramazon-machine-learningamazon-sagemaker-compilers

Which techniques are used by SageMaker Neo for model optimizations


Does SageMaker Neo (SageMaker compilation job) use any techniques for model optimization? Are there any compression techniques used (distillation, quantization etc) to reduce the model size?

I found some description here (https://docs.aws.amazon.com/sagemaker/latest/dg/neo.html) regarding quantization but it's not clear how it could be used.

Thanks very much for any insight.


Solution

  • Neo is optimizing inference using compilation, which is different and often orthogonal to compression

    At the time of this writing, SageMaker Neo is a managed compilation service. That being said, compilation and compression can be combined, and you can prune or distill your network before feeding it to Neo.

    SageMaker Neo covers a large grid of hardware targets and model architectures, and consequently leverages numerous backends and optimizations. Neo internals are publicly documented in many places: