pythonarraysdaskinterleave

Interleave/interweave Dask Arrays lazily


I need to frame-by-frame interleave two large HDF5 datasets representing video frames from two channels of a microscopic measurement. I thought Dask would be appropriate for this job and the downstream processes.

The two arrays have the same shape and data type. Based on this link, I can do it with NumPy for smaller than memory arrays : Interweaving two numpy arrays.

import numpy as np
# a numpy example of channel 1 data
ch1 = np.arange(1,5)[:,np.newaxis,np.newaxis]*np.ones((4,3,2))

# channel 2 has the same shape and dtype
ch2 = np.arange(10,50,10)[:,np.newaxis,np.newaxis]*np.ones((4,3,2))

# the interleaving starts with assigning a new array with douled size of the first dimension
ch1_2 = np.empty((2*ch1.shape[0],*ch1.shape[1:]), dtype=ch1.dtype)
# two assignments takes care of the interleaving 
ch1_2[0::2] = ch1
ch1_2[1::2] = ch2

Unfortunately, it does not apply to Dask.

import dask.array as da
da_ch1 = da.from_array(ch1)
da_ch2 = da.from_array(ch2)
da_ch1_2 = da.empty((2*da_ch1.shape[0],*da_ch1.shape[1:]), dtype=da_ch1.dtype)
da_ch1_2[0::2] = da_ch1
da_ch1_2[1::2] = da_ch2

It fails with: "Item assignment with <class 'slice'> not supported".

Can anybody help me with a Dask compatible alternative approach? Any help would be appreciated.


Solution

  • The code below works on the small example data you have posted. You may also have to prepare a similar delayed function for reading you hdf5 data.

    import dask.array as da
    from dask import delayed
    import numpy as np
    
    @delayed
    def interleave(x1, x2):
        x1_2 = np.empty(ch1_2_shape, dtype=ch1.dtype)
        x1_2[0::2] = x1
        x1_2[1::2] = x2
        return x1_2
    
    # a numpy example of channel 1 data
    ch1 = np.arange(1,5)[:,np.newaxis,np.newaxis]*np.ones((4,3,2))
    
    # channel 2 has the same shape and dtype
    ch2 = np.arange(10,50,10)[:,np.newaxis,np.newaxis]*np.ones((4,3,2))
    
    # Interleave using dask delayed
    ch1_2_shape = (2*ch1.shape[0],*ch1.shape[1:])
    ch1_2 = interleave(ch1, ch2)
    
    # Convert to dask array if required
    ch1_2 = da.from_delayed(interleave(ch1, ch2), ch1_2_shape, dtype=ch1.dtype)
    
    ch1_2.compute()