stringdelphiassemblyaccess-violationbasm

Delphi: Access violation when putting a string in an editbox?


Well, I am studing some inline assembly in Delphi and the assembly crypto routine is all going great, until I try to parse the ShortString into the Textbox.

The violation I get is as follows: Error

The full code is here:

procedure TForm2.Button1Click(Sender: TObject);

var
len,keylen:integer;
name, key:ShortString;

begin

name :=  ShortString(Edit1.Text);
key := '_r <()<1-Z2[l5,^';
len := Length(name);
keylen := Length(key);

nameLen := len;
serialLen := keyLen;

asm

  XOR EAX,EAX
  XOR ESI,ESI
 XOR EDX,EDX
 XOR ECX,ECX


  @loopBegin:

        MOV EAX,ESI
        PUSH $019
        CDQ
        IDIV DWORD PTR DS:[serialLen]
        MOV EAX,ESI
        POP EBX
        LEA ECX,DWORD PTR DS:[key+EDX]
        CDQ
        IDIV DWORD PTR DS:[nameLen]
        LEA EAX,DWORD PTR DS:[name]
        MOVZX EAX,BYTE PTR DS:[name+EDX]
        MOVZX EDX,BYTE PTR DS:[ECX]
        XOR EAX,EDX
        CDQ
        IDIV EBX
        ADD DL,$041
        INC ESI
        CMP ESI,DWORD PTR DS:[serialLen]
        MOV BYTE PTR DS:[ECX],DL

        JL @loopBegin


end;

edit2.Text:= TCaption(key);


end;

If i place a breakpoint on the line "edit2.Text:= TCaption(key);" I can see that the ShortString "key" has indeed been properly encrypted, but with a lot of weird characters behind it, too.

The first 16 characters is the real encryption.

encryption http://img831.imageshack.us/img831/365/29944312.png

bigger version: http://img831.imageshack.us/img831/365/29944312.png

thanks!


Solution

  • What the code does

    For those of you that don't speak assembler, this is what the code is probably supposed to do, in Pascal. "Probably" because the original contains some bugs:

    procedure TForm14.Button1Click(Sender: TObject);
    var KeyLen:Integer;
        Name, Key:ShortString;
        i:Integer;
        CurrentKeyByte:Byte;
        CurrentNameByte:Byte;
    begin
      Name := ShortString(Edit1.Text);
      Key := '_r <()<1-Z2[l5,^';
      keyLen := Length(key);
    
      asm int 3 end; // This is here so I can inspect the assembler output in the IDE
                     // for the "Optimised" version of the code
    
      for i:=1 to Length(Name) do
      begin
        CurrentKeyByte := Byte(Key[i mod KeyLen]);
        CurrentNameByte := Byte(Name[i]);
        CurrentNameByte := ((CurrentKeyByte xor CurrentNameByte) mod $019) + $041;
        Name[i] := AnsiChar(CurrentNameByte);
      end;
    
      Caption := Name;
    
    end;
    

    With optimizations turned on, the assembler code generated by this is actually shorter compared to the proposed code, contains no redundant code and I'm willing to bet is faster. Here are a few optimizations I noticed in the Delphi-generated code (compared to the assembler code proposed by the OP):

    Why is the provided assembler code failing?

    Here's the original assembler code, with comments. The bug's at the end of the routine, at the "CMP" instruction - it's comparing ESI to the length of the KEY, not to the length of the NAME. If the KEY is longer then the NAME, "encryption" goes on over the top of NAME, overwriting stuff (amongst the stuff that gets overwritten is the NULL terminator for the string, causing the debugger to show funny chars after the correct chars).

    While overwriting EBX and ESI is not allowed, this is not what's causing the code to AV, probably because the surrounding Delphi code didn't use EBX or ESI (just tried this).

    asm
    
     XOR EAX,EAX ; Wasteful, the first instruction in Loop overwrites EAX
     XOR ESI,ESI
     XOR EDX,EDX ; Wasteful, the first CDQ instruction in Loop overwrites EDX
     XOR ECX,ECX ; Wasteful, the first LEA instruction overwrites ECX
    
    
     @loopBegin:
           ; Etering the loop, ESI holds the index for the next char to be
           ; encrypted.
    
           MOV EAX,ESI ; Load EAX with the index for the next char, because
                       ; we intend to do some divisions (setting up the call to IDIV)
           PUSH $019   ; ? pushing this here, so we can pop it 3 lines later... obfuscation
           CDQ         ; Sign-extend EAX (required for IDIV)
           IDIV DWORD PTR DS:[serialLen] ; Divide EAX by the length of the key.
           MOV EAX,ESI ; Load the index back to EAX, we're planning on an other IDIV. Why???
           POP EBX     ; Remember the PUSH $019?
           LEA ECX,DWORD PTR DS:[key+EDX] ; EDX is the result of "ESI mod serialLen", this
                                          ; loads the address of the current char in the
                                          ; encryption key into ECX. Dividing by serialLen
                                          ; is supposed to make sure we "wrap around" at the
                                          ; end of the key
            CDQ ; Yet some more obfuscation. We're now extending EAX into EDX in preparation for IDIV.
                ; This is obfuscation becasue the "MOV EAX, ESI" instruction could be written right here
                ; before the CDQ.
            IDIV DWORD PTR DS:[nameLen] ; We divide the current index by the length of the text
                                        ; to be encrypted. Once more the code will only use the reminder,
                                        ; but why would one do this? Isn't ESI (the index) always supposed to
                                        ; be LESS THEN nameLen? This is the first sign of trouble.
            LEA EAX,DWORD PTR DS:[name] ; EAX now holds the address of NAME.
            MOVZX EAX,BYTE PTR DS:[name+EDX] ; EAX holds the current character in name
            MOVZX EDX,BYTE PTR DS:[ECX]      ; EDX holds the current character in Key
            XOR EAX,EDX ; Aha!!!! So this is an obfuscated XOR loop! EAX holds the "name[ESI] xor key[ESI]"
            CDQ         ; We're extending EAX (the XOR result) in preparation for a divide
            IDIV EBX    ; Divde by EAX by EBX (EBX = $019). Why????
            ADD DL,$041 ; EDX now holds the remainder of our previous XOR, after the division by $019;
                        ; This is an number from $000 to $018. Adding $041 turns it into an number from
                        ; $041 to $05A (ASCII chars from "A" to "Z"). Now I get it. This is not encryption,
                        ; this is a HASH function! One can't un-encrypt this (information is thrown away at
                        ; the division).
            INC ESI     ; Prep for the next char
    
    
            ; !!! BUG !!!
            ;
            ; This is what's causing the algorithm to generate the AV. At this step the code is
            ; comparing ESI (the current char index) to the length of the KEY and loops back if
            ; "ESI < serialLen". If NAME is shorter then KEY, encryption will encrypt stuff beyond
            ; then end of NAME (up to the length of KEY). If NAME is longer then KEY, only Length(Key)
            ; bytes would be encrypted and the rest of "Name" would be ignored.
            ;
            CMP ESI,DWORD PTR DS:[serialLen]
    
    
            MOV BYTE PTR DS:[ECX],DL ; Obfuscation again. This is where the mangled char is written
                                     ; back to "Name".
    
            JL @loopBegin            ; Repeat the loop.
    

    My 2 cents worth of advice

    Assembler should be used for SPEED optimizations and nothing else. It looks to me as if the OP tried to use Assembler to obfuscate what the code is doing. Didn't help, it only took me a few minutes to figure out exactly what the code is doing and I'm NOT an assembler expert.