arraysstringloopsassemblyx86

Assembly: calculation of string index


As a beginner to Assembly, I've been practicing disassembling and reverse engineering on Intel x86 assembly in IDA.

The current program I'm trying to figure out validates the user given password by forming it's own "validation password" and comparing the two. If they match, the user given password is accepted.

The validation password is formed by a loop that runs 16 times and the characters for the password come from a label named CHARACTERS which stores the address of the string aAbcdefghijklmn.

aAbcdefghijklmn is defined as 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890'

var_40 is defined as -40 hexadecimal.

mov     [ebp+loop_counter], 0
loc_8049201:
cmp     [ebp+loop_counter], 0Fh ; Compare Two Operands
jge     loc_804922F     ; Jump if Greater or Equal (SF=OF)
mov     eax, CHARACTERS
mov     ecx, [ebp+loop_counter]
mov     ecx, [ebp+ecx*4+var_40]
mov     dl, [eax+ecx]
mov     eax, [ebp+loop_counter]
mov     [ebp+eax+validation_password], dl
mov     eax, [ebp+loop_counter]
add     eax, 1          ; Add
mov     [ebp+loop_counter], eax
jmp     loc_8049201     ; Jump
loc_804922F:
lea     eax, [ebp+validation_password] ; Load Effective Address
mov     ecx, [ebp+user_password]
mov     edx, esp
mov     [edx+4], ecx
mov     [edx], eax
call    _strcmp         ; Call Procedure
cmp     eax, 0          ; Compare Two Operands
jnz     loc_804925D     ; Jump if Not Zero (ZF=0)

This portion creates the validation password.

mov     eax, CHARACTERS
mov     ecx, [ebp+loop_counter]
mov     ecx, [ebp+ecx*4+var_40]
mov     dl, [eax+ecx]
mov     eax, [ebp+loop_counter]
mov     [ebp+eax+validation_password], dl
mov     eax, [ebp+loop_counter]
add     eax, 1          ; Add
mov     [ebp+loop_counter], eax
jmp     loc_8049201     ; Jump

What I cannot for the life of me figure out is how the index for the 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890' string for each loop is calculated.

What I understand is that the character chosen is the address of CHARACTERS stored in eax and the value of [ebp+ecx*4+var_40] as the offset, creating the specific index. This is then stored into dl.

I do not know how to determine the index number using the memory address calculation [ebp+ecx*4+var_40] in each loop.

EDIT:

Initialization of the array at ebp+var_40 is done with _memcpy earlier in the same function.

push    ebp
mov     ebp, esp
push    esi
sub     esp, 64h
mov     eax, [ebp+user_password]
xor     ecx, ecx
lea     edx, unk_804A064 ; db 3
lea     esi, [ebp+var_40]
mov     [esp], esi
mov     [esp+4], edx
mov     dword ptr [esp+8], 3Ch
mov     [ebp+var_58], eax
mov     [ebp+var_5C], ecx
call    _memcpy

unk_804A064 is defined as db 3. Starting from unk_804A064 the 60 following bytes are:

3, 0, 0, 0, 34h, 0, 0, 0, 38h, 0, 0, 0, 1Ah, 0, 0, 0, 2Ch, 0, 0, 0, 2Ch, 0, 0, 0, 1Eh, 0, 0, 0, 26h, 0, 0, 0, 1Bh, 0, 0, 0, 25h, 0, 0, 0, 32h, 0, 0, 0, 13h, 0, 0, 0, 37h, 0, 0, 0, 2Ch, 0, 0, 0, 0Ah, 0, 0, 0


Solution

  • It's a byte gather operation, using int indices from another array on the stack at ebp+var_40. You haven't shown how that array is initialized.

    mov ecx, [ebp+loop_counter] loads ECX with the loop counter. (An optimized build would just keep that in a register the whole time; this debug build produces a lot of extra instructions to wade through, making it harder to reverse-engineer, but also simpler because you know each block of asm corresponds to a C statement, and there are named local variables on the stack).

    mov ecx, [ebp+ecx*4+var_40] loads ECX with an int from a stack array, replacing the previous use of ECX. Like int tmp = indices[i]; where int indices[n]; is in automatic storage (on the stack).

    mov dl, [eax+ecx] uses that as an index into your alphabet string (EAX was earlier loaded from CHARACTERS.) So this is char c = (*CHARACTERS)[tmp];

    The next two instructions, mov eax, [ebp+loop_counter] / mov [ebp+eax+validation_password], dl are validation_password[i] = c;