What is the fastest way we can count how many transparent pixels exist in CIImage
/UIImage
?
For example:
My first thought, if we speak about efficiency, is to use Metal Kernel
using either CIColorKernel
or so, but I can't understand how to use it to output "count".
Also other ideas I had in mind:
CIAreaAverage
CIFilter
?RGB
values?What is the fastest way to achieve this count?
What you want to perform is a reduction operation, which is not necessarily well-suited for the GPU due to its massively parallel nature. I'd recommend not writing a reduction operation for the GPU yourself, but rather use some highly optimized built-in APIs that Apple provides (like CIAreaAverage
or the corresponding Metal Performance Shaders).
The most efficient way depends a bit on your use case, specifically where the image comes from (loaded via UIImage
/CGImage
or the result of a Core Image pipeline?) and where you'd need the resulting count (on the CPU/Swift side or as an input for another Core Image filter?).
It also depends on if the pixels could also be semi-transparent (alpha not 0.0
or 1.0
).
If the image is on the GPU and/or the count should be used on the GPU, I'd recommend using CIAreaAverage
. The alpha value of the result should reflect the percentage of transparent pixels. Note that this only works if there are now semi-transparent pixels.
The next best solution is probably just iterating the pixel data on the CPU. It might be a few million pixels, but the operation itself is very fast so this should take almost no time. You could even use multi-threading by splitting the image up in chunks and use concurrentPerform(...)
of DispatchQueue
.
A last, but probably overkill solution would be to use Accelerate (this would make @FlexMonkey happy): Load the image's pixel data into a vDSP buffer and use the sum
or average
methods to calculate the percentage using the CPU's vector units.
Clarification
When I was saying that a reduction operation is "not necessarily well-suited for the GPU", I meant to say that it's rather complicated to implement in an efficient way and by far not as straightforward as a sequential algorithm.
The check whether a pixel is transparent or not can be done in parallel, sure, but the results need to be gathered into a single value, which requires multiple GPU cores reading and writing values into the same memory. This usually requires some synchronization (and thereby hinders parallel execution) and incurs latency cost due to access to the shared or global memory space. That's why efficient gather algorithms for the GPU usually follow a multi-step tree-based approach. I can highly recommend reading NVIDIA's publications on the topic (e.g. here and here). That's also why I recommended using built-in APIs when possible since Apple's Metal team knows how to best optimize these algorithms for their hardware.
There is also an example reduction implementation in Apple's Metal Shading Language Specification (pp. 158) that uses simd_shuffle
intrinsics for efficiently communicating intermediate values down the tree. The general principle is the same as described by NVIDIA's publications linked above, though.