juliablosc

When using Blosc.jl is it possible to direct it to compress part of an array without making a copy of the part of the array?


In Blosc.jl, I can do

using Blosc

x = rand(100)
Blosc.compress(x)
Blosc.compress(x[1:10]) # this allocates

but

Blosc.compress(@view x[1:10]) # this doesn't allocate

fails. Is there a workaround so I can get @view x[1:10] to be compressed?


Solution

  • If you wish to skip the allocation of the sub-array caused by indexing x, Blosc.jl offers an (undocumented) Blosc.compress(::Ptr{T}, ::Integer) method that takes a pointer to a contiguous array of bytes representing a type T and an array length. You can use it like this:

    using Blosc, BenchmarkTools
    x = rand(UInt8, 10000)
    
    @btime Blosc.compress(pointer($x), 1000) # 17.400 μs (2 allocations: 1.09 KiB)
    @btime Blosc.compress($x[1:1000])        # 17.400 μs (4 allocations: 2.19 KiB)
    
    @assert Blosc.compress(pointer(x), 1000) == Blosc.compress(x[1:1000])
    

    The algorithm implemented in Blosc.jl passes data to the libblosc C library which relies on the data being stored densely in contiguous bytes. The SubArray that is created by @view could potentially have an arbitrary array stride, even though the type can signal a linear index, so there isn't a good way for Blosc.jl to support SubArrays without copying the data into a contiguous array with a known unit stride like a DenseArray.

    Update: There appears to be some movement in Julia's GitHub issue tracker concerning a means to identify arrays with dense data storage at this issue. While this does not specify dense storage in the SubArray type, as needed for multiple dispatch to work properly, it would at least allow the Blosc code to branch appropriately to avoid copying to a DenseArray in certain special cases.