arraysperformancehaskellrepa

Repa creation from ByteString


Initially I have a ByteString, which i then unpack and convert into Int16s, this part of the process takes relatively little time. I then go to convert the list of Int16s into a Repa array with the following line,

Repa.fromListUnboxed (Z :. bytesOfDataPerImage `div` 2) listOfInts

According to the profiler this line is taking ~40% of CPU time, which could just be indicative that the computations I am performing don't warrant the use of Repa. Is there a more efficient route to take when going from ByteString to Repa array?

I have tried the Repa fromByteString function, though the transformation of

Array B DIM1 Word8 -> Array U DIM1 Int16

was pretty slow. I was performing this by first reshaping the array into a 2d array of Word8s, then folding into Int16s. Perhaps the Byte array was the right approach and my conversion method is just wrong.

convertImageData :: Array B DIM1 Word8 -> Array U DIM1 Int16
convertImageData !arr = Repa.foldS convertWords 0 (Repa.map fromIntegral (splitArray arr))

splitArray :: Array B DIM1 Word8 -> Array U DIM2 Word8
splitArray !arr = computeUnboxedS $ reshape (Z :. ((size $ extent arr) `div` 2) :. 2) arr


convertWords :: Int16 -> Int16 -> Int16
convertWords !word1 !word2 = (word1 `shiftL` 8) .|. word2

For some context this program is being benchmarked against the same program written in C/C++.


Solution

  • Your initial approach of converting into a list and later calling Repa.fromListUnboxed is certainly very slow, since all you are doing is forcing elements of a list and than loading it sequentially into the unboxed array. That is why conversion into a list takes very little time, since all it does is it creates a bunch of thunks, but the actual computation happens when you load it into the array.

    Your second approach is definitely way better, but there are still unnecessary steps, eg. there is no need to reshape the array, you can just pass the new size to the fromByteString function`. So here is a slightly improved version:

    bytesToRepaOriginal :: Bytes.ByteString -> Repa.Array Repa.U Repa.DIM1 Int16
    bytesToRepaOriginal bs =
      Repa.foldS
        convertWords
        0
        (Repa.map fromIntegral $
         Repa.fromByteString (Repa.Z Repa.:. (Bytes.length bs `div` 2) Repa.:. 2) bs)
    

    fromByteString function and B representation in Repa isn't particularly fast for some reason, so there is a faster way to do it, namely to construct an array by directly indexing the ByteString:

    bytesToRepaP :: Monad m => Bytes.ByteString -> m (Repa.Array Repa.U Repa.DIM1 Int16)
    bytesToRepaP bs =
      Repa.computeUnboxedP $
      Repa.fromFunction
        (Repa.Z Repa.:. (Bytes.length bs `div` 2))
        (\(Repa.Z Repa.:. i) ->
           let i' = i * 2
               f = Bytes.unsafeIndex bs
            in (fromIntegral (f i') `shiftL` 8) .|. fromIntegral (f (i' + 1)))
    

    Switching to sequential computation with Repa.computeUnboxedS will give you a factor of x2 slow down, but since we are trying optimize it, we need go all the way with parallel computation.

    Not to steal all the thunder from a very nice Repa library, I'd like to also show how all that would work with new massiv library:

    import Data.Massiv.Array as Massiv
    
    bytesToMassiv :: Bytes.ByteString -> Massiv.Array Massiv.U Massiv.Ix1 Int16
    bytesToMassiv bs =
      Massiv.makeArrayR U Par (Bytes.length bs `div` 2)
        (\i ->
           let i' = i * 2
               f = Bytes.unsafeIndex bs
            in (fromIntegral (f i') `shiftL` 8) .|. fromIntegral (f (i' + 1)))
    

    Just to present some concrete numbers showing the optimizations in action here is a stripped down criterion benchmarks:

    benchmarking fromByteString/Massiv Parallel
    time                 1.114 ms   (1.079 ms .. 1.156 ms)
    
    benchmarking fromByteString/Repa Parallel
    time                 1.954 ms   (1.871 ms .. 2.040 ms)
    
    benchmarking fromByteString/Repa Original
    time                 15.80 ms   (15.67 ms .. 15.92 ms)