I have a generator that has input and output:
Input<Buffer<uint8_t>> input{"input", 2};
Output<Buffer<uint8_t>> output{"output", 2};
In generate method I defined following algorithm:
output(c,x) = Halide::cast<uint8_t> (input(mux(c, {1,0,2,3,0,2}), x));
The problem is that when I pass input and output buffer from main program I get the desired output but input buffer gets also modified which I would like to avoid. I tried to make function and then apply algorithm but I have same effect:
Func decode;
Func in;
in(c,x) = input(c,x);
decode(c,x) = Halide::cast<uint8_t> (in(mux(c, {1,0,2,3,0,2}), x));
output(c,x) = decode(c,x);
...
I also tried to create copy of input buffer from Input<Buffer<uint8_t>> input{"input", 2} like:
in(c,x) = input(c,x);
Halide::Buffer<uint8_t> in_copy = in.realize({Halide::Internal::as_const_int(input.dim(0).extent()), Halide::Internal::as_const_int(input.dim(1).extent())});
but this results in Unhandled exception: Error: Buffer argument input is nullptr which is understandable. Do You have any suggestion how to avoid input buffer mutation? @Alex asked to post compile-able generator so here is a version with using the functions
#include "Halide.h"
using namespace Halide;
class Yuv422Decoder : public Halide::Generator<Yuv422Decoder> {
public:
Input<Buffer<uint8_t>> input{"input", 2};
Output<Buffer<uint8_t>> output{"output", 2};
Var c,x,xo,xi,co,ci;
void generate() {
Func decode;
Func in;
in(c,x) = input(c,x);
// define algorithm
decode(c,x) = Halide::cast<uint8_t> (in(mux(c, {1,0,2,3,0,2}), x));
output(c,x) = decode(c,x);
}
void schedule() {
output.bound_extent(c,6);
output.split(x, xo, xi, input.dim(1).extent()/8);
output.parallel(xo,2);
output.parallel(xi,2);
output.unroll(c);
output.vectorize(xi,128);
}
};
// Use this macro to create function that you can call in your program
HALIDE_REGISTER_GENERATOR(Yuv422Decoder, yuv422decoder);
Problem was in size of the output buffer. I doubled it and now I don't have mutation of input buffer. I am not sure how exactly this happens when size of the output buffer is too small but anyhow it was programming mistake in the end.
uint8_t* buffer_out = new uint8_t [2*size];
Runtime::Buffer<uint8_t> out(buffer_out, {{0,6,1}, {0,size/4,6}});