In Chapter 8 of thinking with types I learned that the fmap Sum
part of
fastSum :: [Int] -> Int
fastSum = getSum . mconcat . fmap Sum
has a O(n)
runtime cost, whereas using coerce
instead avoids that overhead.
I know newtype
s have no representation overhead, but what I do not understand is what is the runtime effect of mapping a newtype constructor over a list. I thought this would only have a compile-time overhead, and it should be just O(1)
, since the compiler only needs to know the type of the fmap SomeNewtypeCtr
expression.
It's hard to understand what Haskell exactly does in such cases, because of optimization. Haskell only mandates what is the result, not how it is obtained.
Some possibilities:
fmap Sum
performs a list scan, and copies each cell and each element;fmap Sum
performs a list scan, and copies each cell but not the elements (the new cells point to the old elements);fmap Sum
does not scan the list at all and is automatically optimized to a no-op.I tried godbolt.org to inspect the generated Core and assembly. Note that it still uses an old GHC 8.6.2. Still, I turned optimization on (-O2
) and compiled
foo :: [Int] -> [Sum Int]
foo = fmap Sum
and obtained
foo = Example.foo1 `cast` (.....)
Example.foo1 = \ (v_a1iF :: [Int]) -> v_a1iF
Hence foo
becomes the identity function, suitably coerced, producing the assembly
movq %r14,%rbx
andq $-8,%rbx
jmp *(%rbx)
which should be, roughly speaking, the equivalent of an immediate return in the GHC runtime system.
Concluding: Data.Coerce.coerce
is very nice since it ensures a no-op, but even plain Haskell, after optimization, can be surprisingly efficient.