iosswiftmetalmetal-performance-shaders

Swift Metal Compute Shaders giving unexpected blending result


I am learning Metal and creating a small application on my iPhone XR with XCode 13.4.1 ( iOS target 15.5 ).

I render a simple sphere and I am attempting to create a 'glow' or 'bloom' effect as described in this post: https://weblog.jamisbuck.org/2016/2/27/bloom-effect-in-metal.html

I am successfully using Metal Performance Shaders to perform the gaussian blur and render into a texture only the 'blurred' outline of my sphere. The texture dimensions of the output of this process match my current 'drawable' dimensions as well.

My problem is that when I use the compute shader later to blend the gaussian texture with my scene texture, the output texture is combined, but oddly the guassian input texture seems to be shrunk to 25% of the input size and therefore only covers the top left quater of my scene. Please see the attached screenshot for a better description.

Why is the blending working this way and only rendering in the top left quadrant? How can I fix it? Thanks for any help.

I've pasted the relevant code here for everything after my 3d scene rendering, through the compute shader blending, and presentation to the screen.

        // Do the drawing of the 3d scene
        commandEncoder.endEncoding()
        
        print("Before Texture: \(drawable.texture.width) \(drawable.texture.height)")
        
        if true {
            
            // Now create a compute shader to blend the blurred mask texture
            // with the framebuffer
            let computeFunction = self.computeLibrary.makeFunction(name: "ComputeBlender")
            
            // Setup compute encoder
            let computePipelineState = try? device.makeComputePipelineState(function: computeFunction!)
            let computeEncoder = commandBuffer.makeComputeCommandEncoder()
            computeEncoder?.setComputePipelineState(computePipelineState!)
            
            // input one -- primary texture
            computeEncoder?.setTexture(drawable.texture, index: 0)
            // input two -- mask texture
            computeEncoder?.setTexture(self.debugImageA, index: 1)
            // output texture
            computeEncoder?.setTexture(drawable.texture, index: 2)

            if false {
                let threadsPerGroup = MTLSizeMake(1, 1, 1)
                let w = Int(view.drawableSize.width)
                let h = Int(view.drawableSize.height)
                let threadsPerGrid = MTLSizeMake(w, h, 1)
                computeEncoder?.dispatchThreads(threadsPerGrid, threadsPerThreadgroup: threadsPerGroup)
            } else {
                let viewWidth = Int(view.bounds.size.width)
                let viewHeight = Int(view.bounds.size.height)

                // set up an 8x8 group of threads
                let threadGroupSize = MTLSize(width: 1, height: 1, depth: 1)

                // define the number of such groups needed to process the textures
                let numGroups = MTLSize(
                  width: viewWidth/threadGroupSize.width+1,
                  height: viewHeight/threadGroupSize.height+1,
                  depth: 1)

                computeEncoder?.dispatchThreadgroups(numGroups,
                  threadsPerThreadgroup: threadGroupSize)
            }

            computeEncoder?.endEncoding()
            
        }
        
        // Capture debug texture
        self.debugImageB = drawable.texture
        
        // Present the final/blended image
        commandBuffer.present(drawable)
        commandBuffer.commit()
        commandBuffer.waitUntilCompleted()
        
        print("After Texture: \(self.debugImageB.width) \(self.debugImageB.height)")

Here is the shader that matches the post - please note that I rearranged the colors to make the final image show the incorrect blending:

#include <metal_stdlib>
using namespace metal;

kernel void ComputeBlender(
                           
  texture2d<float, access::read> source [[ texture(0) ]],
  texture2d<float, access::read> mask [[ texture(1) ]],
  texture2d<float, access::write> dest [[ texture(2) ]],
  uint2 gid [[ thread_position_in_grid ]])
{
  float4 source_color = source.read(gid);
  float4 mask_color = mask.read(gid);
  float4 result_color = source_color + mask_color;

    
  result_color = float4(result_color.g, result_color.b, result_color.r, 1.0);
    
  dest.write(result_color, gid);
    
}

enter image description here


Solution

  • Your thread position logic is not correctly implemented in your shader. You should take into account the size of your textures.

    uint2 uv = uint2(gid.x + mask.get_width(), gid.y + mask.get_height());
    dest.write(result_color, uv);