I was trying to generate heap memory profile for following naive Haskell code that copies a file:
import System.Environment
import System.IO
import qualified Data.ByteString as B
import qualified Data.ByteString.Lazy as LB
naiveCopy :: String -> String -> IO ()
naiveCopy from to = do
putStrLn $ "From: " ++ from
putStrLn $ "To: " ++ to
s <- B.readFile from
B.writeFile to s
main = do
args <- getArgs
mapM (\ x-> putStrLn x) args
naiveCopy (head args) ((head.tail) args)
Command that build the code with ghc 8.0.1:
ghc -o t -rtsopts -prof -fprof-auto t.hs
Command that collect the profiling data:
./t +RTS -p -h -RTS in/data out/data && hp2ps -e8in -c t.hp
where in/data
is a quite big file (approx 500MB) which will take the program about 2 seconds to copy.
The problem is that I couldn't get heap profiling data if I use the strict Data.ByteString
, there's only an small t.hp file without any sample data, it looks like this:
JOB "t in/data out/data +RTS -p -h"
DATE "Thu Aug 4 20:19 2016"
SAMPLE_UNIT "seconds"
VALUE_UNIT "bytes"
BEGIN_SAMPLE 0.000000
END_SAMPLE 0.000000
BEGIN_SAMPLE 0.943188
END_SAMPLE 0.943188
and corresponding profile chart like this:
However I could get heap profiling data if I switch to the lazy version Data.ByteString.Lazy
, profile chart like this:
Update: Thanks @ryachza, I added a -i0
parameter to set sampling interval and tried again, this time I got sample data for strict ByteString
and it looked reasonable (I was copying a 500M file and the memory allocation peak in following profiling chart is about 500M)
./t +RTS -p -h -RTS in/data out/data && hp2ps -e8in -c t.hp
It appears as though the runtime isn't "getting the chance to measure" the heap. If you add -s
to your RTS
options, it should print some time and allocation information. When I run this, I see the bytes allocated and total memory use is very high (size of the file), but the maximum residency (and the number of samples) is very low, and while the elapsed time is high the actual "work" time is practically 0.
Adding the RTS option -i0
allowed me to reproducibly visualize the bytestring allocation as PINNED
(this is the classification because the byte arrays that bytestring uses internally are allocated in an area in which the GC can't move things). You could experiment with different -h
options which associate allocations to different cost centers (for example, -hy
should show ARR_WORDS
) but it probably wouldn't have much value in this case as the bytestrings are really just "big chunks of raw memory".
The references I used to find the RTS options were (clearly I wasn't particular about the GHC version - I can't imagine these flags change frequently):