[SOLVED] Transforming MPSNNImageNode using Metal Performance Shader

Transforming MPSNNImageNode using Metal Performance Shader

I am currently working on replicating YOLOv2 (not tiny) on iOS (Swift4) using MPS.

A problem is that it is hard for me to implement space_to_depth function (https://www.tensorflow.org/api_docs/python/tf/space_to_depth) and concatenation of two results from convolutions (13x13x256 + 13x13x1024 -> 13x13x1280). Could you give me some advice on making these parts? My codes are below.

...



let conv19 = MPSCNNConvolutionNode(source: conv18.resultImage,

                                 weights: DataSource("conv19", 3, 3, 1024, 1024))



let conv20 = MPSCNNConvolutionNode(source: conv19.resultImage,

                                 weights: DataSource("conv20", 3, 3, 1024, 1024))



let conv21 = MPSCNNConvolutionNode(source: conv13.resultImage,

                                 weights: DataSource("conv21", 1, 1, 512, 64))



/*****

    1. space_to_depth with conv21

    2. concatenate the result of conv20(13x13x1024) to the result of 1 (13x13x256)

    I need your help to implement this part!

******/

Solution

I believe space_to_depth can be expressed in form of a convolution: For instance, for an input with dimension [1,2,2,1], Use 4 convolution kernels that each output one number to one channel, ie. [[1,0],[0,0]] [[0,1],[0,0]] [[0,0],[1,0]] [[0,0],[0,1]], this should put all input numbers from spatial dimension to depth dimension.
MPS actually has a concat node. See here: https://developer.apple.com/documentation/metalperformanceshaders/mpsnnconcatenationnode

You can use it like this: concatNode = [[MPSNNConcatenationNode alloc] initWithSources:@[layerA.resultImage, layerB.resultImage]];