I hope to use halide to simulate a three-level cache architecture for cpu. and the memory size is divided by each top layer. such as
(256*256) -> 16 * (16*256) -> 16 * 4 * (4*256) -> 16 * (16*256) -> (256*256)
I use this cpp function to run
Func l1, l2, l3, l2_out, l3_out;
l3.store_in(MemoryType::L3);
l2.store_in(MemoryType::L2);
l1.store_in(MemoryType::L1);
l3_out.store_in(MemoryType::L3);
l2_out.store_in(MemoryType::L2);
auto l2_size = 16*256, l1_size = 4*256;
for (auto i = 0; i < 16; i++) { // 16 times l3->l2
RDom r_l2(0, l2_size, "l2_reduce");
l2(x2) = l3(x2);
l2(r_l2) = l3(i * l2_size + r_l2);
for (auto j = 0; j < 4; j++) { // 4 times l1->l2
RDom r_l1(0, l1_size, "l1_reduce");
l1(x1) = l2(j * l1_size + r_l1);
l2_out(j*l1_size+r_l1) = l1(r_l1);
}
l3_out(i*l2_size + r_l2) = l2_out(r_l2);
}
it seems I can't define reduction in pure fuction definition.
terminate called after throwing an instance of 'Halide::CompileError'
what(): Error: In pure definition of Func "l1$0":
Reduction domain referenced in pure function definition.
is there any way to run it?
You need to define l1(x1)
to something that does not contain a reduction domain first.
For instance, the following should silence the error:
l1(x1) = 0;
l1(x1) = l2(j * l1_size + r_l1);
However, the above is a nonsensical update to use. The error is likely an indication of a logical error in the code. I have not tried to understand it in detail, but it could be that you want to use l1(r_l1)
on the right hand side or r_l1
on the left hand side of the second expression above.