I've stumbled over some irritating things. I know that haskell works with weak head normal form (WHNF) and I know what this is. Typing the following code into ghci (I am using the command :sprint which reduces the expression to WHNF to my knowledge.):
let intlist = [[1,2],[2,3]]
:sprint intlist
gives intlist = _
this makes totally sense to me.
let stringlist = ["hi","there"]
:sprint stringlist
gives stringlist = [_,_]
This already confuses me. But then:
let charlist = [['h','i'], ['t','h','e','r','e']]
:sprint charlist
surprisingly gives charlist = ["hi","there"]
As far as I understood Haskell, strings are nothing else than lists of chars, which seems to be confirmed by checking the types "hi" :: [Char]
and ['h','i'] :: [Char]
.
I am confused, because in my understanding all three examples above are more or less the same (a list of lists) and should therefore reduce to the same WHNF, namely _. What am I missing?
Thanks
Note that :sprint
does not reduce an expression to WHNF. If it did, then the following would give 4
rather than _
:
Prelude> let four = 2 + 2 :: Int
Prelude> :sprint four
four = _
Rather, :sprint
takes the name of a binding, traverses the internal representation of the binding's value, and shows the already "evaluated parts" (i.e., the parts that are constructors) while using _
as a placeholder for unevaluated thunks (i.e., the suspended lazy function calls). If the value is completely unevaluated, no evaluation will be done, not even to WHNF. (And if the value is completely evaluated, you'll get that, not just WHNF.)
What you are observing in your experiments is a combination of polymorphic versus monomorphic numeric types, different internal representations for string literals versus explicit lists of characters, etc. Basically, you're observing technical differences in how different literal expressions are compiled to byte code. So, interpreting these implementation details as having something to do with WHNF is going to hopelessly confuse you. Generally, you should use :sprint
as a debugging tool only, not as a way to learn about WHNF and the semantics of Haskell evaluation.
If you really want to understand what :sprint
is doing, you can turn on a few flags in GHCi to see how expressions are actually being handled and, so, eventually compiled to bytecode:
> :set -ddump-simpl -dsuppress-all -dsuppress-uniques
After this, we can see the reason your intlist
gives _
:
> let intlist = [[1,2],[2,3]]
==================== Simplified expression ====================
returnIO
(: ((\ @ a $dNum ->
: (: (fromInteger $dNum 1) (: (fromInteger $dNum 2) []))
(: (: (fromInteger $dNum 2) (: (fromInteger $dNum 3) [])) []))
`cast` <Co:10>)
[])
You can ignore the returnIO
and the outer :
call, and concentrate on the part that starts with ((\ @ a $dNum -> ...
Here $dNum
is the dictionary for the Num
constraint. This means that the generated code hasn't yet resolved the actual type a
in the type Num a => [[a]]
, so the entire expression is still represented as a function call taking a (dictionary for) an appropriate Num
type. In other words, it's an unevaluated thunk, and we get:
> :sprint intlist
_
On the other hand, specify the type as Int
, and the code is completely different:
> let intlist = [[1::Int,2],[2,3]]
==================== Simplified expression ====================
returnIO
(: ((: (: (I# 1#) (: (I# 2#) []))
(: (: (I# 2#) (: (I# 3#) [])) []))
`cast` <Co:6>)
[])
and so is the :sprint
output:
> :sprint intlist
intlist = [[1,2],[2,3]]
Similarly, literal strings and explicit lists of characters have completely different representations:
> let stringlist = ["hi", "there"]
==================== Simplified expression ====================
returnIO
(: ((: (unpackCString# "hi"#) (: (unpackCString# "there"#) []))
`cast` <Co:6>)
[])
> let charlist = [['h','i'], ['t','h','e','r','e']]
==================== Simplified expression ====================
returnIO
(: ((: (: (C# 'h'#) (: (C# 'i'#) []))
(: (: (C# 't'#)
(: (C# 'h'#) (: (C# 'e'#) (: (C# 'r'#) (: (C# 'e'#) [])))))
[]))
`cast` <Co:6>)
[])
and the differences in the :sprint
output represents artifacts of which parts of the expression GHCi considers evaluated (explicit :
constructors) versus unevaluated (the unpackCString#
thunks).