assembly x86 reverse-engineering position-independent-code

Bomb Lab Assignment Phase 5 - Writing Its C Equivalent

I am trying to solve a slightly modified Bomb Lab problem for my Computer Architecture class. I'm supposed to write the C equivalent for the functions, but got stuck in Phase 5. It's very similar to this question and I have indeed figured out what the function does for the most part.

    105b:   56                      push   %esi
    105c:   53                      push   %ebx
    105d:   83 ec 10                sub    $0x10,%esp
    1060:   e8 6b fa ff ff          call   ad0 <__x86.get_pc_thunk.bx>
    1065:   81 c3 fb 3e 00 00       add    $0x3efb,%ebx
    106b:   8b 74 24 1c             mov    0x1c(%esp),%esi
    106f:   56                      push   %esi
    1070:   e8 bf 02 00 00          call   1334 <string_length>
    1075:   83 c4 10                add    $0x10,%esp
    1078:   83 f8 06                cmp    $0x6,%eax
    107b:   75 2e                   jne    10ab <phase_5+0x50>
    107d:   89 f0                   mov    %esi,%eax
    107f:   83 c6 06                add    $0x6,%esi
    1082:   b9 00 00 00 00          mov    $0x0,%ecx
    1087:   0f b6 10                movzbl (%eax),%edx
    108a:   83 e2 0f                and    $0xf,%edx
    108d:   03 8c 93 00 da ff ff    add    -0x2600(%ebx,%edx,4),%ecx
    1094:   83 c0 01                add    $0x1,%eax
    1097:   39 f0                   cmp    %esi,%eax
    1099:   75 ec                   jne    1087 <phase_5+0x2c>
    109b:   83 f9 34                cmp    $0x34,%ecx
    109e:   74 05                   je     10a5 <phase_5+0x4a>
    10a0:   e8 38 05 00 00          call   15dd <explode_bomb>
    10a5:   83 c4 04                add    $0x4,%esp
    10a8:   5b                      pop    %ebx
    10a9:   5e                      pop    %esi
    10aa:   c3                      ret    
    10ab:   e8 2d 05 00 00          call   15dd <explode_bomb>
    10b0:   eb cb                   jmp    107d <phase_5+0x22>

It's a function that accepts a string of 6 characters (bomb explodes if it doesn't) and does some form of looping algorithm that produces a number. At the end, if the result of the loop does not equal 52 (0x34), the bomb explodes once again. However, I couldn't understand a certain part of the code:

    108d:   03 8c 93 00 da ff ff    add    -0x2600(%ebx,%edx,4),%ecx

Apparently, it offsets the number you've obtained by masking the ASCII equivalent of each character in the string by some unknown algorithm. For now, I've made a table of offsets for each character and managed to get an accepted string, aaaabb, but I would like to know what the C equivalent of the code looks like.

Solution

Just like in Jester's answer, it's indexing an array. ecx += table[edx], for static int table[]; The EDX index is scaled by 4 in the addressing mode because sizeof(int) is 4; asm needs byte offsets, C indexing uses element offsets.

-0x2600 + %ebx is a static array, same as 0x804a4a0 in the linked question. But it's harder to find in static disassembly because whoever created this executable annoyingly compiled it as 32-bit PIE (position-independent executable).

32-bit PIC / PIE sucks because PC-relative addressing was new with x86-64, so this is needlessly more complicated to reverse engineer.

It gets the GOT (Global Offset Table) address into EBX: First call __x86.get_pc_thunk.bx returns its return address in EBX, i.e. 0x1065 with the placeholder addresses you got from objdump -d. Then add $0x3efb,%ebx adds the offset from that location to the GOT.

Then static data is addressed relative to the GOT base (in EBX in this case). Pain the ass to follow that vs. an absolute address. You could just single-step in a debugger in a running process, after the kernel's program-loader maps the code to some virtual address (other than 0x1000).

Or do it manually: 0x1065 + 0x3efb = GOT base (EBX) of 0x4f60.
0x4f60-0x2600 is the lookup table array start: 0x2960 (if you were using objdump -D to dump the data sections as well). You might be able to use that address in GDB (before start or run) with an x command to dump the table in a convenient format other than fake disassembly of data as code from objdump.

The actual address in a running process will be that plus some multiple of 4096 (0x1000).