haskellfunctional-programmingghcireductionweak-head-normal-form

Haskells Weak Head Normal Form


I've stumbled over some irritating things. I know that haskell works with weak head normal form (WHNF) and I know what this is. Typing the following code into ghci (I am using the command :sprint which reduces the expression to WHNF to my knowledge.):

let intlist = [[1,2],[2,3]]
:sprint intlist

gives intlist = _ this makes totally sense to me.

let stringlist = ["hi","there"]
:sprint stringlist 

gives stringlist = [_,_] This already confuses me. But then:

let charlist = [['h','i'], ['t','h','e','r','e']]
:sprint charlist

surprisingly gives charlist = ["hi","there"]

As far as I understood Haskell, strings are nothing else than lists of chars, which seems to be confirmed by checking the types "hi" :: [Char] and ['h','i'] :: [Char].

I am confused, because in my understanding all three examples above are more or less the same (a list of lists) and should therefore reduce to the same WHNF, namely _. What am I missing?

Thanks


Solution

  • Note that :sprint does not reduce an expression to WHNF. If it did, then the following would give 4 rather than _:

    Prelude> let four = 2 + 2 :: Int
    Prelude> :sprint four
    four = _
    

    Rather, :sprint takes the name of a binding, traverses the internal representation of the binding's value, and shows the already "evaluated parts" (i.e., the parts that are constructors) while using _ as a placeholder for unevaluated thunks (i.e., the suspended lazy function calls). If the value is completely unevaluated, no evaluation will be done, not even to WHNF. (And if the value is completely evaluated, you'll get that, not just WHNF.)

    What you are observing in your experiments is a combination of polymorphic versus monomorphic numeric types, different internal representations for string literals versus explicit lists of characters, etc. Basically, you're observing technical differences in how different literal expressions are compiled to byte code. So, interpreting these implementation details as having something to do with WHNF is going to hopelessly confuse you. Generally, you should use :sprint as a debugging tool only, not as a way to learn about WHNF and the semantics of Haskell evaluation.

    If you really want to understand what :sprint is doing, you can turn on a few flags in GHCi to see how expressions are actually being handled and, so, eventually compiled to bytecode:

    > :set -ddump-simpl -dsuppress-all -dsuppress-uniques
    

    After this, we can see the reason your intlist gives _:

    > let intlist = [[1,2],[2,3]]
    ==================== Simplified expression ====================
    returnIO
      (: ((\ @ a $dNum ->
             : (: (fromInteger $dNum 1) (: (fromInteger $dNum 2) []))
               (: (: (fromInteger $dNum 2) (: (fromInteger $dNum 3) [])) []))
          `cast` <Co:10>)
         [])
    

    You can ignore the returnIO and the outer : call, and concentrate on the part that starts with ((\ @ a $dNum -> ...

    Here $dNum is the dictionary for the Num constraint. This means that the generated code hasn't yet resolved the actual type a in the type Num a => [[a]], so the entire expression is still represented as a function call taking a (dictionary for) an appropriate Num type. In other words, it's an unevaluated thunk, and we get:

    > :sprint intlist
    _
    

    On the other hand, specify the type as Int, and the code is completely different:

    > let intlist = [[1::Int,2],[2,3]]
    ==================== Simplified expression ====================
    returnIO
      (: ((: (: (I# 1#) (: (I# 2#) []))
             (: (: (I# 2#) (: (I# 3#) [])) []))
          `cast` <Co:6>)
         [])
    

    and so is the :sprint output:

    > :sprint intlist
    intlist = [[1,2],[2,3]]
    

    Similarly, literal strings and explicit lists of characters have completely different representations:

    > let stringlist = ["hi", "there"]
    ==================== Simplified expression ====================
    returnIO
      (: ((: (unpackCString# "hi"#) (: (unpackCString# "there"#) []))
          `cast` <Co:6>)
         [])
    
    > let charlist = [['h','i'], ['t','h','e','r','e']]
    ==================== Simplified expression ====================
    returnIO
      (: ((: (: (C# 'h'#) (: (C# 'i'#) []))
             (: (: (C# 't'#)
                   (: (C# 'h'#) (: (C# 'e'#) (: (C# 'r'#) (: (C# 'e'#) [])))))
                []))
          `cast` <Co:6>)
         [])
    

    and the differences in the :sprint output represents artifacts of which parts of the expression GHCi considers evaluated (explicit : constructors) versus unevaluated (the unpackCString# thunks).