cmipscodewarriormetrowerks

How should I write my C code to have the resulting assembly use extra dsll32 and dsra32 instructions?


I'm decompiling a PS2 game that was shipped as a debug build. I've gotten as far as decompiling enough to be able compile an ELF file using the compiler that was originally used (Metrowerks CodeWarrior).

Now I'm doing comparisons between the disassembly of the original ELF file and the disassembly of the one I compiled. There's one recurring pattern that the original assembly has that mine doesn't: regular shifting using the dsll32 and dsra32 instructions.

Original assembly:

dsll32  v1,s1,16
dsra32  v1,v1,16
subu    v1,$0,v1
dsll32  v1,v1,16
dsra32  v1,v1,16
dsll32  s1,v1,16
dsra32  s1,s1,16

This was decompiled to the following C code:

d1 = -d1;

And was compiled to the following assembly:

subu    v1,$0,s1
dsll32  s1,v1,16
dsra32  s1,s1,16

Notice that one pair of shifting instructions is missing. So far, I've failed to replicate this. I've tried adding various casts, changing it to d1 = 0 - d1, I was suggested to add a 64-bit cast, but nothing achieves the desired result.

Here's another example:

Original assembly:

lh      v0,80(sp)
dsll32  v1,v0,16
dsra32  v1,v1,16
lw      v0,128(sp)
lb      v0,0(v0)
dsll32  v0,v0,24
dsra32  v0,v0,24
dsll32  v0,v0,16
dsra32  v0,v0,16
addu    v0,v1,v0
dsll32  v0,v0,16
dsra32  v0,v0,16
dsll32  s3,v0,16
dsra32  s3,s3,16

C code:

x = xposi + sprdat->xoff;

Compiled to:

lh      v1,80(sp)
lw      v0,128(sp)
lb      v0,0(v0)
dsll32  v0,v0,24
dsra32  v0,v0,24
addu    v0,v1,v0
dsll32  s3,v0,16
dsra32  s3,s3,16

Does anyone have any idea what C code would be responsible for this?


Solution

  • The TL;DR: Don't worry things are fine.

    1. You're decompiling a debug version. So, optimization is [probably] turned off.
    2. The original assembly looks like unoptimized code.
    3. The decompiled code is d1 = -d1;
    4. The resultant assembly of your recompiled code is optimized with an additional factor (see below).
    5. On an x86, this would produce a single instruction. If d1 were in (e.g.) %ecx, the assembly would be: neg %ecx
    6. On architectures that don't have that (e.g. mips), we subtract the value from 0.

    The following is a bit of a rough explanation.


    The C [decompiled] variable is d1. It resides in MIPS register s1. The code uses a temporary variable (call it tmp) that resides in MIPS register v1

    To avoid confusion [possibly just mine :-)] between the arbitrary name assigned by the decompiler d1, let's rename this as mydata.

    In the original assembly, the first two instructions are converting a 16 bit integer into a 32 bit integer:

    dsll32  v1,s1,16            # move 16 bit int's sign bit into bit 31
    dsra32  v1,v1,16            # shift back with sign extension
    

    Logical right shift would shift zeroes in from the left.

    However ...

    Arithmetic right shift will shift bit 31 into the number from the left side.

    This is common trick for a number of architectures that don't have (as x86 does) special CISC instructions for moving a 16 bit number into a 32 bit register with automatic sign extension (e.g. movswl), either from memory or another register.

    For example, a -1 in a short would have a bit pattern of 0xFFFF (i.e. 0x0000FFFF). We are converting this to 32 bits (i.e. 0xFFFFFFFF)

    This uses an extra tmp variable that is put in register v1. The tmp is probably [implicitly] defined as:

    int tmp;
    

    This all implies that mydata (which resides in register s1) was defined as a 16 bit integer:

    short mydata;
    

    Although x86 has word size arithmetic instructions that operate directly on 16 bits (e.g. subw), MIPS only has 32 bit arithmetic register-to-register instructions (the exception is the addi which sign extends the immediate operand).

    Now, to actually negate the number, we subtract it from zero:

    subu    v1,$0,v1                # perform negation
    

    The equivalent C code is:

    tmp = 0 - tmp;
    

    The next two instructions put the sign bit from the 32 bit tmp into the sign bit of a 16 bit number:

    dsll32  v1,v1,16                # put sign bit from 16 bit part into bit 31
    dsra32  v1,v1,16                # shift back to get sign bit for 16 bit number
    

    The remaining two instructions are doing a similar [and unnecessary] conversion to get the final 16 bit value for mydata:

    dsll32  s1,v1,16                # mydata = tmp << 16;
    dsra32  s1,s1,16                # mydata = mydata >> 16;
    

    The rebuilt assembly already assumes that s1 has been sign extended to 32 bits (in prior instructions). So, the first two instructions of the original assembly aren't needed:

    # assume s1 has _already_ been sign extended to 32 bits here in _prior_
    # instructions
    subu    v1,$0,s1
    
    # move sign bit from 32 bits back to 16 bits
    dsll32  s1,v1,16
    dsra32  s1,s1,16
    

    And, the extra unnecessary final two instructions are elided.


    To summarize: The recompiled code does exactly what the original code does, but is optimized to do it in fewer instructions.

    The second example is doing much the same thing.