machine-learningconv-neural-networkconvolutionsemantic-segmentationunet-neural-network

Why is dilated convolution computationally efficient?


I have been studying UNet inspired architecture ENet and I think I follow the basic concepts. The ground-rock of efficiency of ENet is dilated convolution (apart other things). I understand the preserving spatial resolution, how it is computed and so on, however I can't understand why it is computationally and memory-wise less expensive than e.g. max-pooling.

ENet: https://arxiv.org/pdf/1606.02147.pdf


Solution

  • You simply skip computational layer with a dilated convolution layer:

    For example a dilated convolution with

    is comparable to

    For further reference look at the amazing paper from Vincent Dumoulin, Francesco Visin: A guide to convolution arithmetic for deep learning

    Also on the github of this paper is a animation how dilated convolution works: https://github.com/vdumoulin/conv_arithmetic