[SOLVED] Metal compute shaders threadgroup & threadExecutionWidth

Metal compute shaders threadgroup & threadExecutionWidth

Can someone explain in simple terms what threadgroup conceptually is in Metal compute shaders and other terms such as SIMD group, threadExecutionWidth (wavefront)? I read the docs but am more confused. For instance, if I have a 1024x1024 image, how many threadgroups can I have, how can I map thread to each pixel, how many can run concurrently, etc.? I can't find WWDC video describing compute shaders and these concepts.

Solution

A threadgroup is a group of threads that work together to solve a certain (sub)problem. You can have a maximum of 512 or 1024 threads in a threadgroup (depending on the device you're using).

The threadExecutionWidth is the size of the SIMD groups used. It's typically 32, meaning each SIMD group has 32 threads in it. For optimal performance, the number of threads in your threadgroup should be a multiple of threadExecutionWidth. (This is indeed what others call the wavefront or warp.)

If you have a 1024x1024 image and you want one thread to process one pixel, and the maximum threadgroup size is 512, then you can create a grid of 1024x1024 threads that consists of 32x64 threadgroups of size 32x16 (i.e. 512).

But really, you can divide up the threads however you want. You could also have a grid of 2x1024 threadgroups of size 512x1, or whatever.