performancehaskellghc

GHC Generating Redundant Core Operations


I have the following program for converting 6bit ASCII to binary format.

ascii2bin :: Char -> B.ByteString
ascii2bin = B.reverse . fst . B.unfoldrN 6 decomp . to6BitASCII -- replace to6BitASCII with ord if you want to compile this
    where decomp n = case quotRem n 2 of (q,r) -> Just (chr r,q)

bs2bin :: B.ByteString -> B.ByteString
bs2bin = B.concatMap ascii2bin

this produces the following core segment:

Rec {
$wa
$wa =
  \ ww ww1 ww2 w ->
    case ww2 of wild {
      __DEFAULT ->
        let {
          wild2
          wild2 = remInt# ww1 2 } in
        case leWord# (int2Word# wild2) (__word 1114111) of _ { 
          False -> (lvl2 wild2) `cast` ...;                                                                                   
          True ->
            case writeWord8OffAddr#
                   ww 0 (narrow8Word# (int2Word# (ord# (chr# wild2)))) w
            of s2 { __DEFAULT ->
            $wa (plusAddr# ww 1) (quotInt# ww1 2) (+# wild 1) s2
            }   
        };  
      6 -> (# w, (lvl, lvl1, Just (I# ww1)) #)
    }   
end Rec }

notice that ord . chr == id, and so there is a redundant operation here: narrow8Word# (int2Word# (ord# (chr# wild2)))

Is there a reason GHC is needlessly converting from Int -> Char -> Int, or is this an example of poor code generation? Can this be optimized out?

This is using GHC 7.4.2, I have not tried compiling with any other version. I have since found the problem remains in GHC 7.6.2, but the redundant operations are removed in the current HEAD branch on github.


Solution

  • Is there a reason GHC is needlessly converting from Int -> Char -> Int, or is this an example of poor code generation? Can this be optimized out?

    Not really (to both). The core you get from -ddump-simpl is not the end. There are a few optimisations and transformations still done after that on the way to the assembly code. But removing the redundant conversions here isn't actually an optimisation.

    They can be, and are, removed between the core and the assembly. The point is that these primops - except for the narrowing - are no-ops, they are only present in the core because that's typed. Since they are no-ops, it doesn't matter whether there is a redundant chain of them in the core.

    The assembly that 7.6.1 produces from the code [it's more readable than what 7.4.2 produces, so I take that] - with ord instead of to6BitASCII - is

    ASCII.$wa_info:
    _cXT:
        addq $64,%r12
        cmpq 144(%r13),%r12
        ja _cXX
        movq %rdi,%rcx
        cmpq $6,%rdi
        jne _cXZ
        movq $GHC.Types.I#_con_info,-56(%r12)
        movq %rsi,-48(%r12)
        movq $Data.Maybe.Just_con_info,-40(%r12)
        leaq -55(%r12),%rax
        movq %rax,-32(%r12)
        movq $(,,)_con_info,-24(%r12)
        movq $lvl1_rVq_closure+1,-16(%r12)
        movq $lvl_rVp_closure+1,-8(%r12)
        leaq -38(%r12),%rax
        movq %rax,0(%r12)
        leaq -23(%r12),%rbx
        jmp *0(%rbp)
    _cXX:
        movq $64,192(%r13)
    _cXV:
        movl $ASCII.$wa_closure,%ebx
        jmp *-8(%r13)
    _cXZ:
        movl $2,%ebx
        movq %rsi,%rax
        cqto
        idivq %rbx
        movq %rax,%rsi
        cmpq $1114111,%rdx
        jbe _cY2
        movq %rdx,%r14
        addq $-64,%r12
        jmp GHC.Char.chr2_info
    _cY2:
        movb %dl,(%r14)
        incq %r14
        leaq 1(%rcx),%rdi
        addq $-64,%r12
        jmp ASCII.$wa_info
        .size ASCII.$wa_info, .-ASCII.$wa_info
    

    The part where the narrow8Word# (int2Word# (ord# (chr# wild2))) appears in the core is after the cmpq $1114111, %rdx. If the quotient is not out-of-range, the code jumps to _cY2 which contains no such conversions anymore. A byte is written to the array, some pointers/counters are incremented, and that's it, jump back to the top.

    I think it would be possible to generate better code from this than GHC currently does, but the redundant no-op conversions already vanish.