haskellstrictnessunordered-containers

Why is HashMap not in normal form upon series of inserts?


I've been trying to ensure the strictness of an in-memory model of a Haskell program using ghc-heap-view package and the utils it provides when I noticed that my HashMaps don't seem to be in NF upon a series on inserts. I tried printing Heap tree and indeed it shows some thunks. I then tried another way of inserting elements (using union and singleton) and this time it comes out strict.

Could somebody please explain why is this so and advise if there's anything I can do to make insert behave the same way as the other method?

Here's my test code:

module Main where

import           Control.Exception           (evaluate)
import           Data.Foldable
import           Data.HashMap.Strict         (HashMap)
import qualified Data.HashMap.Strict         as HM
import           GHC.HeapView

test1 :: HashMap Int Int
test1 = foldl' (\m v -> HM.insert v v m) HM.empty [0..5]

test2 :: HashMap Int Int
test2 = foldl' (\m v -> HM.union (HM.singleton v v) m) HM.empty [0..5]

main :: IO ()
main = do
  putStrLn "HeapTree for test1"
  t1 <- evaluate test1
  buildHeapTree 10 (asBox t1) >>= print . ppHeapTree

  putStrLn "HeapTree for test2"
  t2 <- evaluate test2
  buildHeapTree 10 (asBox t2) >>= print . ppHeapTree

And here's the output:

HeapTree for test1
"BitmapIndexed ([ (_thunk (I# 0) (I# 0) 0), (_thunk (I# 1) (I# 1) 1), (Leaf (I# 2) (I# 2) 2), (Leaf (I# 3) (I# 3) 3), (Leaf (I# 4) (I# 4) 4), (Leaf (I# 5) (I# 5) 5) ]) 63"
HeapTree for test2
"BitmapIndexed ([ (Leaf (I# 0) (I# 0) 0), (Leaf (I# 1) (I# 1) 1), (Leaf (I# 2) (I# 2) 2), (Leaf (I# 3) (I# 3) 3), (Leaf (I# 4) (I# 4) 4), (Leaf (I# 5) (I# 5) 5) ]) 63"
(0.02 secs, 1,067,672 bytes)

Solution

  • When inserting a new, non-colliding key into a Leaf node, insert uses a helper function called two to generate a two-element map. The two function is lazy in the values of the keys, which leads to GHC creating thunks to create the two new Leaf nodes. This whole thing is pretty silly, because the keys are actually certain to be in WHNF by then. But (presumably because of a recursive go function) GHC doesn't realize that. The problem should be fixed in the next version of unordered-containers.