I have the following program for converting 6bit ASCII to binary format.
ascii2bin :: Char -> B.ByteString
ascii2bin = B.reverse . fst . B.unfoldrN 6 decomp . to6BitASCII -- replace to6BitASCII with ord if you want to compile this
where decomp n = case quotRem n 2 of (q,r) -> Just (chr r,q)
bs2bin :: B.ByteString -> B.ByteString
bs2bin = B.concatMap ascii2bin
this produces the following core segment:
Rec {
$wa
$wa =
\ ww ww1 ww2 w ->
case ww2 of wild {
__DEFAULT ->
let {
wild2
wild2 = remInt# ww1 2 } in
case leWord# (int2Word# wild2) (__word 1114111) of _ {
False -> (lvl2 wild2) `cast` ...;
True ->
case writeWord8OffAddr#
ww 0 (narrow8Word# (int2Word# (ord# (chr# wild2)))) w
of s2 { __DEFAULT ->
$wa (plusAddr# ww 1) (quotInt# ww1 2) (+# wild 1) s2
}
};
6 -> (# w, (lvl, lvl1, Just (I# ww1)) #)
}
end Rec }
notice that ord . chr == id
, and so there is a redundant operation here: narrow8Word# (int2Word# (ord# (chr# wild2)))
Is there a reason GHC is needlessly converting from Int -> Char -> Int, or is this an example of poor code generation? Can this be optimized out?
This is using GHC 7.4.2, I have not tried compiling with any other version. I have since found the problem remains in GHC 7.6.2, but the redundant operations are removed in the current HEAD branch on github.
Is there a reason GHC is needlessly converting from
Int -> Char -> Int
, or is this an example of poor code generation? Can this be optimized out?
Not really (to both). The core you get from -ddump-simpl
is not the end. There are a few optimisations and transformations still done after that on the way to the assembly code. But removing the redundant conversions here isn't actually an optimisation.
They can be, and are, removed between the core and the assembly. The point is that these primops - except for the narrowing - are no-ops, they are only present in the core because that's typed. Since they are no-ops, it doesn't matter whether there is a redundant chain of them in the core.
The assembly that 7.6.1 produces from the code [it's more readable than what 7.4.2 produces, so I take that] - with ord
instead of to6BitASCII
- is
ASCII.$wa_info:
_cXT:
addq $64,%r12
cmpq 144(%r13),%r12
ja _cXX
movq %rdi,%rcx
cmpq $6,%rdi
jne _cXZ
movq $GHC.Types.I#_con_info,-56(%r12)
movq %rsi,-48(%r12)
movq $Data.Maybe.Just_con_info,-40(%r12)
leaq -55(%r12),%rax
movq %rax,-32(%r12)
movq $(,,)_con_info,-24(%r12)
movq $lvl1_rVq_closure+1,-16(%r12)
movq $lvl_rVp_closure+1,-8(%r12)
leaq -38(%r12),%rax
movq %rax,0(%r12)
leaq -23(%r12),%rbx
jmp *0(%rbp)
_cXX:
movq $64,192(%r13)
_cXV:
movl $ASCII.$wa_closure,%ebx
jmp *-8(%r13)
_cXZ:
movl $2,%ebx
movq %rsi,%rax
cqto
idivq %rbx
movq %rax,%rsi
cmpq $1114111,%rdx
jbe _cY2
movq %rdx,%r14
addq $-64,%r12
jmp GHC.Char.chr2_info
_cY2:
movb %dl,(%r14)
incq %r14
leaq 1(%rcx),%rdi
addq $-64,%r12
jmp ASCII.$wa_info
.size ASCII.$wa_info, .-ASCII.$wa_info
The part where the narrow8Word# (int2Word# (ord# (chr# wild2)))
appears in the core is after the cmpq $1114111, %rdx
. If the quotient is not out-of-range, the code jumps to _cY2
which contains no such conversions anymore. A byte is written to the array, some pointers/counters are incremented, and that's it, jump back to the top.
I think it would be possible to generate better code from this than GHC currently does, but the redundant no-op conversions already vanish.