halide

how to realize a 3-level cached with Halide


I hope to use halide to simulate a three-level cache architecture for cpu. and the memory size is divided by each top layer. such as

(256*256) -> 16 * (16*256) -> 16 * 4 * (4*256) -> 16 * (16*256) -> (256*256)

I use this cpp function to run

Func l1, l2, l3, l2_out, l3_out;
l3.store_in(MemoryType::L3);
l2.store_in(MemoryType::L2);
l1.store_in(MemoryType::L1);
l3_out.store_in(MemoryType::L3);
l2_out.store_in(MemoryType::L2);
auto l2_size = 16*256, l1_size = 4*256; 
for (auto i = 0; i < 16; i++) { // 16 times l3->l2
    RDom r_l2(0, l2_size, "l2_reduce");
    l2(x2) = l3(x2);
    l2(r_l2) = l3(i * l2_size + r_l2);
    for (auto j = 0; j < 4; j++) { // 4 times l1->l2
        RDom r_l1(0, l1_size, "l1_reduce");
        l1(x1) = l2(j * l1_size + r_l1);
        l2_out(j*l1_size+r_l1) = l1(r_l1);
    }
    l3_out(i*l2_size + r_l2) = l2_out(r_l2);
}

it seems I can't define reduction in pure fuction definition.

terminate called after throwing an instance of 'Halide::CompileError'
  what():  Error: In pure definition of Func "l1$0":
Reduction domain referenced in pure function definition.

is there any way to run it?


Solution

  • You need to define l1(x1) to something that does not contain a reduction domain first.

    For instance, the following should silence the error:

    l1(x1) = 0;
    l1(x1) = l2(j * l1_size + r_l1);
    

    However, the above is a nonsensical update to use. The error is likely an indication of a logical error in the code. I have not tried to understand it in detail, but it could be that you want to use l1(r_l1) on the right hand side or r_l1 on the left hand side of the second expression above.