In Blosc.jl, I can do
using Blosc
x = rand(100)
Blosc.compress(x)
Blosc.compress(x[1:10]) # this allocates
but
Blosc.compress(@view x[1:10]) # this doesn't allocate
fails. Is there a workaround so I can get @view x[1:10]
to be compressed?
If you wish to skip the allocation of the sub-array caused by indexing x
, Blosc.jl
offers an (undocumented) Blosc.compress(::Ptr{T}, ::Integer)
method that takes a pointer to a contiguous array of bytes representing a type T
and an array length. You can use it like this:
using Blosc, BenchmarkTools
x = rand(UInt8, 10000)
@btime Blosc.compress(pointer($x), 1000) # 17.400 μs (2 allocations: 1.09 KiB)
@btime Blosc.compress($x[1:1000]) # 17.400 μs (4 allocations: 2.19 KiB)
@assert Blosc.compress(pointer(x), 1000) == Blosc.compress(x[1:1000])
The algorithm implemented in Blosc.jl
passes data to the libblosc C library which relies on the data being stored densely in contiguous bytes. The SubArray
that is created by @view
could potentially have an arbitrary array stride, even though the type can signal a linear index, so there isn't a good way for Blosc.jl
to support SubArray
s without copying the data into a contiguous array with a known unit stride like a DenseArray
.
Update: There appears to be some movement in Julia's GitHub issue tracker concerning a means to identify arrays with dense data storage at this issue. While this does not specify dense storage in the SubArray
type, as needed for multiple dispatch to work properly, it would at least allow the Blosc code to branch appropriately to avoid copying to a DenseArray
in certain special cases.