rustffi

Rust zero-cost handling of C two-dimensional array without pointer arithmetic?


I’m porting old C audio processing code to Rust. All over the place there are functions that receive two-dimensional float arrays (float**), along with their respective lengths (in other words, audio sample buffers, with their number of channels and samples).

In C those are accessed like

for (int c=0; c<channels; c++) {
    for (int s=0; s<samples; s++) {
        buffer[c][s] = 0.0f;
    }
}

In Rust, I suppose one could go with the usual pointer arithmetic (*(*buffer.add(c)).add(s)) just as well considering the length of the arrays are known at all times, but if there is a method that

  1. incurs no considerable extra cost/extra allocations relative to the C/pointer arithmetic method (considering the cost of bound checks negligible, since it would also incur if the function was pure Rust without FFI in the first place)
  2. involves no pointer arithmetic (or is at least safer than direct raw pointer manipulation)

I would like to know of such a method.

At first I assumed C arrays of arrays would translate to Rust slices of slices (&mut [&mut [f32]]) in Rust, but that conversion seems to be neither trivial nor ideal, hence the new question.


Solution

  • You can make a contiguous 2D slice of slices from a Rust Vec without too much trouble, and let Rust optimize away most bounds checking:

    fn init_buffer(buffer: &mut [&mut [f32]]) -> () {
        let mut f = 1.0;
        for row in buffer {
            for cell in row.iter_mut() {
                *cell = f;
                f += 1.0;
            }
        }
    }
    
    fn main() {
        let channels = 10;
        let samples = 15;
        // Base 1d array
        let mut buf_raw = vec![0.0; channels * samples];
    
        // Vector of 'width' elements slices
        let mut buf: Vec<_> = buf_raw.as_mut_slice().chunks_mut(samples).collect();
    
        init_buffer(buf.as_mut_slice());
        println!("{buf:?}");
    }
    

    Rust Playground link

    I had it assign differing values just to make it obvious that it's working. The function, not knowing the data is contiguous, does need to check the slice widths per-slice to bound the loop, but the incremental overhead should be pretty minimal; it's not bounds-checking per cell. If the function gets inlined, the optimizer may be able to eliminate that work. It's may be faster than the C code (depending on precisely how the C code created the 2D array), as the underlying data is fully contiguous (the layout is very close to the ideal C code for runtime defined "contiguous" 2D arrays, where C would use an array of pointers into a 1D array containing all the data contiguously), reducing cache misses and the like, where a C array of pointers that dynamically allocates each sub-array would be more fragmented.