Halide non-contiguous memory layout

Is it possible to use non-c/fortran ordering in Halide? (where given dimensions x, y, c, x varies the fastest, then c varies the 2nd fastest (strides in numpy at least would be: .strides = (W*C, 1, W) Our memory layout is a stack of images where the channels of each image are stacked by scanline.

(Sorry if the layout still isn't clear enough, I can try to clarify). Using the python bindings, I always get ValueError: ndarray is not contiguous when trying to pass in my numpy array with .strides set.

I've tried changing the numpy array to use contiguous strides (without changing the memory layout) just to get it into Halide, then setting .set_stride in halide, but no luck. I'm just wanting to make sure I'm not trying to do something that can't/shouldn't be done.

I think this is similar to the line-by-line layout mentioned at https://halide-lang.org/tutorials/tutorial_lesson_16_rgb_generate.html, except more dimensions in C since the images are "stacked" along channel (to produce a W, H, C*image_count tensor)

Any advice would be much appreciated.

Thanks!

Solution

This is more of a numpy question than a Halide one. The following Halide code illustrates use of an array in the shape you are looking for (I think):

import halide as hl
import numpy as np;

x, y, c = hl.Var('x'), hl.Var('y'), hl.Var('c')
f = hl.Func('f')
f[x, y, c] = (x * 3) + (y * 12) + c
# This would be necessary for internally allocated buffers
# f.reorder_storage(x, c, y)

# These control output layout
f.output_buffer().dim(1).set_stride(12)
f.output_buffer().dim(2).set_stride(3)
# Probably wanted for efficiency
f.reorder(x, c, y)
result = f.realize(4, 5, 3)

print(result, result[0, 1, 1])
np_result = np.array(result)
print(np_result, np_result[0, 1, 1])
print(np_result.shape, " ", np_result.strides, " ", np_result.flags)

I'm not well versed in numpy and not sure how you would allocate an array in that layout from scratch but the answer might have to be something like lib.stride_tricks.as_strided.