Can someone explain in simple terms what threadgroup conceptually is in Metal compute shaders and other terms such as SIMD group, threadExecutionWidth (wavefront)? I read the docs but am more confused. For instance, if I have a 1024x1024 image, how many threadgroups can I have, how can I map thread to each pixel, how many can run concurrently, etc.? I can't find WWDC video describing compute shaders and these concepts.
A threadgroup is a group of threads that work together to solve a certain (sub)problem. You can have a maximum of 512
or 1024
threads in a threadgroup (depending on the device you're using).
The threadExecutionWidth
is the size of the SIMD groups used. It's typically 32
, meaning each SIMD group has 32
threads in it. For optimal performance, the number of threads in your threadgroup should be a multiple of threadExecutionWidth
. (This is indeed what others call the wavefront or warp.)
If you have a 1024x1024
image and you want one thread to process one pixel, and the maximum threadgroup size is 512
, then you can create a grid of 1024x1024
threads that consists of 32x64
threadgroups of size 32x16
(i.e. 512
).
But really, you can divide up the threads however you want. You could also have a grid of 2x1024
threadgroups of size 512x1
, or whatever.