I am currently trying out Halide, playing around with calculating the maximum/minimum over all channels of an image. I would like to achieve this for a arbitrary images, where the amount of channels is only known at runtime.
I successfully got the following solution:
#include "Halide.h"
#include "halide_image_io.h"
using namespace Halide::Tools;
int main(int argc, char **argv) {
Halide::Buffer<uint8_t> input = load_image("rgb.png");
Halide::Var x, y;
Halide::Func max_channels, min_channels;
max_channels(x, y) = input(x, y, 0);
min_channels(x, y) = input(x, y, 0);
for (int i = 1; i < input.channels(); ++i)
{
max_channels(x, y) = max(max_channels(x, y), input(x, y, i));
min_channels(x, y) = min(min_channels(x, y), input(x, y, i));
}
{
Halide::Buffer<uint8_t> output =
max_channels.realize({input.width(), input.height(), 1});
save_image(output, "maximum.png");
}
{
Halide::Buffer<uint8_t> output =
min_channels.realize({input.width(), input.height(), 1});
save_image(output, "minimum.png");
}
printf("Success!\n");
return 0;
}
However, I was wondering if it would be possible to achieve this without the explicit for
loop. Going by the documentation, it should be possible to achieve this using the Halide::RDom
class. From the examples given there, I would think that the following should work.
#include "Halide.h"
#include "halide_image_io.h"
using namespace Halide::Tools;
int main(int argc, char **argv) {
Halide::Buffer<uint8_t> input = load_image("rgb.png");
Halide::Var x, y;
Halide::Func max_channels, min_channels;
Halide::RDom r(input);
min_channels(x, y) = uint8_t{255};
max_channels(x, y) = uint8_t{0};
min_channels(r.x, r.y) = minimum(input(r.x, r.y, r.z));
max_channels(r.x, r.y) = maximum(input(r.x, r.y, r.z));
{
Halide::Buffer<uint8_t> output =
max_channels.realize({input.width(), input.height(), 1});
save_image(output, "maximum.png");
}
{
Halide::Buffer<uint8_t> output =
min_channels.realize({input.width(), input.height(), 1});
save_image(output, "minimum.png");
}
printf("Success!\n");
return 0;
}
This compiles, but unfortunately crashes with the following error message:
terminate called after throwing an instance of 'Halide::CompileError'
what(): Error: In update definition 0 of Func "f1":
Tuple element 0 of update definition has type uint8, but pure definition has type int32
Aborted (core dumped)
This message does not make any sense to me, as all the values involved are uint8_t
, so I am not sure where the programme is getting its int32_t
from.
Thus, my question: What is the correct way to calculate the maximum/minimum over all channels of an image using domains? Or is this something that is not possible, and a for
loop has to be used instead?
Here's a way using the maximum
and minimum
helpers:
#include <Halide.h>
#include <halide_image_io.h>
using namespace Halide;
using namespace Halide::Tools;
int main() {
Buffer<uint8_t> input = load_image("rgb.png");
Var x, y;
RDom r_chan(0, input.dim(2).extent());
Func min_img;
Func max_img;
min_img(x, y) = minimum(input(x, y, r_chan));
max_img(x, y) = maximum(input(x, y, r_chan));
Buffer<uint8_t> min_out = min_img.realize({input.width(), input.height()});
Buffer<uint8_t> max_out = max_img.realize({input.width(), input.height()});
save_image(min_out, "min_out.png");
save_image(max_out, "max_out.png");
}
You just need to create a reduction domain that spans the number of channels (given by input.dim(2).extent()
). If you don't want to use the helpers, you can instead write (for example with min_img
):
min_img(x, y) = cast<uint8_t>(255);
min_img(x, y) = min(input(x, y, r_chan), min_img(x, y));
The way reduction domains work (absent other scheduling directives) is they insert an innermost loop that repeats the rule for each point in the reduction domain. Thus the loops in this second example look like this:
for y:
for x:
min_img(...) = ...
for y:
for x:
for r_chan in [0, 2]:
min_img(...) = ...
If you're worried about the optimizer not eliminating the first loop where it initializes everything to 255
, then you could instead write:
min_img(x, y) = undef(type_of<uint8_t>()); // skip the pure step
min_img(x, y) = min(
input(x, y, r_chan),
select(r_chan == 0, cast<uint8_t>(255), min_img(x, y))
);
I would expect LLVM to peel the first iteration (at least) in this case.