haskellbytestringmemory-profiling

No heap profiling data for module Data.ByteString


I was trying to generate heap memory profile for following naive Haskell code that copies a file:

import System.Environment
import System.IO
import qualified Data.ByteString as B
import qualified Data.ByteString.Lazy as LB

naiveCopy :: String -> String -> IO ()
naiveCopy from to = do
  putStrLn $ "From: " ++ from
  putStrLn $ "To: " ++ to
  s <- B.readFile from
  B.writeFile to s

main = do
  args <- getArgs
  mapM (\ x-> putStrLn x) args
  naiveCopy (head args) ((head.tail) args)

Command that build the code with ghc 8.0.1:

ghc -o t -rtsopts -prof -fprof-auto t.hs

Command that collect the profiling data:

./t +RTS -p -h -RTS in/data out/data && hp2ps -e8in -c t.hp

where in/data is a quite big file (approx 500MB) which will take the program about 2 seconds to copy.

The problem is that I couldn't get heap profiling data if I use the strict Data.ByteString, there's only an small t.hp file without any sample data, it looks like this:

JOB "t in/data out/data +RTS -p -h"
DATE "Thu Aug  4 20:19 2016"
SAMPLE_UNIT "seconds"
VALUE_UNIT "bytes"
BEGIN_SAMPLE 0.000000
END_SAMPLE 0.000000
BEGIN_SAMPLE 0.943188
END_SAMPLE 0.943188

and corresponding profile chart like this: Strict ByteString without profiling data

However I could get heap profiling data if I switch to the lazy version Data.ByteString.Lazy, profile chart like this: Lazy ByteString with profiling data

Update: Thanks @ryachza, I added a -i0 parameter to set sampling interval and tried again, this time I got sample data for strict ByteString and it looked reasonable (I was copying a 500M file and the memory allocation peak in following profiling chart is about 500M)

./t +RTS -p -h -RTS in/data out/data && hp2ps -e8in -c t.hp

Strict ByteString with profiling data


Solution

  • It appears as though the runtime isn't "getting the chance to measure" the heap. If you add -s to your RTS options, it should print some time and allocation information. When I run this, I see the bytes allocated and total memory use is very high (size of the file), but the maximum residency (and the number of samples) is very low, and while the elapsed time is high the actual "work" time is practically 0.

    Adding the RTS option -i0 allowed me to reproducibly visualize the bytestring allocation as PINNED (this is the classification because the byte arrays that bytestring uses internally are allocated in an area in which the GC can't move things). You could experiment with different -h options which associate allocations to different cost centers (for example, -hy should show ARR_WORDS) but it probably wouldn't have much value in this case as the bytestrings are really just "big chunks of raw memory".

    The references I used to find the RTS options were (clearly I wasn't particular about the GHC version - I can't imagine these flags change frequently):

    1. https://downloads.haskell.org/~ghc/7.0.1/docs/html/users_guide/runtime-control.html
    2. https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/profiling.html