Initially I have a ByteString, which i then unpack and convert into Int16s, this part of the process takes relatively little time. I then go to convert the list of Int16s into a Repa array with the following line,
Repa.fromListUnboxed (Z :. bytesOfDataPerImage `div` 2) listOfInts
According to the profiler this line is taking ~40% of CPU time, which could just be indicative that the computations I am performing don't warrant the use of Repa. Is there a more efficient route to take when going from ByteString to Repa array?
I have tried the Repa fromByteString function, though the transformation of
Array B DIM1 Word8 -> Array U DIM1 Int16
was pretty slow. I was performing this by first reshaping the array into a 2d array of Word8s, then folding into Int16s. Perhaps the Byte array was the right approach and my conversion method is just wrong.
convertImageData :: Array B DIM1 Word8 -> Array U DIM1 Int16
convertImageData !arr = Repa.foldS convertWords 0 (Repa.map fromIntegral (splitArray arr))
splitArray :: Array B DIM1 Word8 -> Array U DIM2 Word8
splitArray !arr = computeUnboxedS $ reshape (Z :. ((size $ extent arr) `div` 2) :. 2) arr
convertWords :: Int16 -> Int16 -> Int16
convertWords !word1 !word2 = (word1 `shiftL` 8) .|. word2
For some context this program is being benchmarked against the same program written in C/C++.
Your initial approach of converting into a list and later calling Repa.fromListUnboxed
is certainly very slow, since all you are doing is forcing elements of a list and than loading it sequentially into the unboxed array. That is why conversion into a list takes very little time, since all it does is it creates a bunch of thunks, but the actual computation happens when you load it into the array.
Your second approach is definitely way better, but there are still unnecessary steps, eg. there is no need to reshape
the array, you can just pass the new size to the fromByteString
function`. So here is a slightly improved version:
bytesToRepaOriginal :: Bytes.ByteString -> Repa.Array Repa.U Repa.DIM1 Int16
bytesToRepaOriginal bs =
Repa.foldS
convertWords
0
(Repa.map fromIntegral $
Repa.fromByteString (Repa.Z Repa.:. (Bytes.length bs `div` 2) Repa.:. 2) bs)
fromByteString
function and B
representation in Repa isn't particularly fast for some reason, so there is a faster way to do it, namely to construct an array by directly indexing the ByteString
:
bytesToRepaP :: Monad m => Bytes.ByteString -> m (Repa.Array Repa.U Repa.DIM1 Int16)
bytesToRepaP bs =
Repa.computeUnboxedP $
Repa.fromFunction
(Repa.Z Repa.:. (Bytes.length bs `div` 2))
(\(Repa.Z Repa.:. i) ->
let i' = i * 2
f = Bytes.unsafeIndex bs
in (fromIntegral (f i') `shiftL` 8) .|. fromIntegral (f (i' + 1)))
Switching to sequential computation with Repa.computeUnboxedS
will give you a factor of x2 slow down, but since we are trying optimize it, we need go all the way with parallel computation.
Not to steal all the thunder from a very nice Repa library, I'd like to also show how all that would work with new massiv library:
import Data.Massiv.Array as Massiv
bytesToMassiv :: Bytes.ByteString -> Massiv.Array Massiv.U Massiv.Ix1 Int16
bytesToMassiv bs =
Massiv.makeArrayR U Par (Bytes.length bs `div` 2)
(\i ->
let i' = i * 2
f = Bytes.unsafeIndex bs
in (fromIntegral (f i') `shiftL` 8) .|. fromIntegral (f (i' + 1)))
Just to present some concrete numbers showing the optimizations in action here is a stripped down criterion benchmarks:
benchmarking fromByteString/Massiv Parallel
time 1.114 ms (1.079 ms .. 1.156 ms)
benchmarking fromByteString/Repa Parallel
time 1.954 ms (1.871 ms .. 2.040 ms)
benchmarking fromByteString/Repa Original
time 15.80 ms (15.67 ms .. 15.92 ms)