objective-cmacosmetalmetalkitmetal-performance-shaders

How do you synchronize a Metal Performance Shader with an MTLBlitCommandEncoder?


I'm trying to better understand the synchronization requirements when working with Metal Performance Shaders and an MTLBlitCommandEncoder.

I have an MTLCommandBuffer that is set up as follows:

How do I ensure that the blit encoder completely finishes copying the data from Texture A to Texture B before the metal performance shader starts trying to scale Texture B? Do I even have to worry about this or does the serial nature of a command buffer take care of this for me already?

Metal has the concept of fences using MTLFence for synchronizing access to resources, but I don't see anyway to have a metal performance shader wait on a fence. (Whereas waitForFence: is present on the encoders.)

If I can't use fences and I do need to synchronize, is the recommended practice to just enqueue the blit encoder, then call waitUntilCompleted on the command buffer before enqueue the shader and calling waitUntilCompleted a second time? ex:

id<MTLCommandBuffer> commandBuffer;

// Enqueue blit encoder to copy Texture A -> Texture B
id<MTLBlitCommandEncoder> blitEncoder = [commandBuffer blitCommandEncoder];
[blitEncoder copyFromTexture:...];
[blitEncoder endEncoding];

// Wait for blit encoder to complete.
[commandBuffer commit];
[commandBuffer waitUntilCompleted];

// Scale Texture B -> Texture C
MPSImageBilinearScale *imageScaleShader = [[MPSImageBilinearScale alloc] initWithDevice:...];  
[imageScaleShader encodeToCommandBuffer:commandBuffer...];

// Wait for scaling shader to complete.
[commandBuffer commit];
[commandBuffer waitUntilCompleted];

The reason I think I need to do the intermediary copy into Texture B is because MPSImageBilinearScale appears to scale its entire source texture. The clipOffset is useful for output, but it doesn't apply to the actual scaling or transform. So the tile needs to be extracted from Texture A into Texture B that is the same size as the tile itself. Then the scaling and transform will "make sense". Disregard this footnote because I had forgotten some basic math principles and have since figured out how to make the scale transform's translate properties work with the clipRect.


Solution

  • Metal takes care of this for you. The driver and GPU execute commands in a command buffer as though in serial fashion. (The "as though" allows for running things in parallel or out of order for efficiency, but only if the result would be the same as when done serially.)

    Synchronization issues arise when both the CPU and GPU are working with the same objects. Also with presenting textures on-screen. (You shouldn't be rendering to a texture that's being presented on screen.)

    There's a section of the Metal Programming Guide which deals with read-write access to resources by shaders, which is not exactly the same, but should reassure you:

    Memory Barriers

    Between Command Encoders

    All resource writes performed in a given command encoder are visible in the next command encoder. This is true for both render and compute command encoders.

    Within a Render Command Encoder

    For buffers, atomic writes are visible to subsequent atomic reads across multiple threads.

    For textures, the textureBarrier method ensures that writes performed in a given draw call are visible to subsequent reads in the next draw call.

    Within a Compute Command Encoder

    All resource writes performed in a given kernel function are visible in the next kernel function.