delphiassemblyx86string-comparison

Fast Delphi ASM ASCII string comparison optionally ignoring first char


I am looking to create a Delphi function that ignores the first character of either string if it starts with a Chr(127). This is used in a sort comparison which is extremely slow with 30,000 items when having to copy the text from the 2nd character before the comparison. This is the original function.

function CompareText(const S1, S2: string): Integer; assembler;
asm
        PUSH    ESI
        PUSH    EDI
        PUSH    EBX
        MOV     ESI,EAX
        MOV     EDI,EDX
        OR      EAX,EAX
        JE      @@0
        MOV     EAX,[EAX-4]
@@0:    OR      EDX,EDX
        JE      @@1
        MOV     EDX,[EDX-4]
@@1:    MOV     ECX,EAX
        CMP     ECX,EDX
        JBE     @@2
        MOV     ECX,EDX
@@2:    CMP     ECX,ECX
@@3:    REPE    CMPSB
        JE      @@6
        MOV     BL,BYTE PTR [ESI-1]
        CMP     BL,'a'
        JB      @@4
        CMP     BL,'z'
        JA      @@4
        SUB     BL,20H
@@4:    MOV     BH,BYTE PTR [EDI-1]
        CMP     BH,'a'
        JB      @@5
        CMP     BH,'z'
        JA      @@5
        SUB     BH,20H
@@5:    CMP     BL,BH
        JE      @@3
        MOVZX   EAX,BL
        MOVZX   EDX,BH
@@6:    SUB     EAX,EDX
        POP     EBX
        POP     EDI
        POP     ESI
end;

Solution

  • ignores the first character of either string if it starts with a Chr(127).

    I can certainly see how "having to copy the text from the 2nd character before the comparison" can get extremely slow.
    You don't need to copy anything. In the presence of Chr(127) you just consider the second position in the string as its true start. In code it means incrementing the addresses ESI/EDI and also decrementing the lengths EAX/EDX.

    For a better performance don't use BL and BH, but use full 32-bit registers, even if it means preserving an extra register like EBP.

         push    esi
         push    edi
         push    ebx
         push    ebp
    
         mov     esi, eax             ; 1st string
         test    eax, eax             ; NULL pointer ?
         jz      @@0
         mov     eax, [esi-4]         ; LEN
         test    eax, eax             ; Is length 0 ?
         jz      @@0
         cmp     byte ptr [esi], 127  ; Is first char Chr(127) ?
         jne     @@0
         inc     esi                  ; Skip Chr(127)
         dec     eax                  ; LEN--
    
    @@0: mov     edi, edx             ; 2nd string
         test    edx, edx             ; NULL pointer ?
         jz      @@1
         mov     edx, [edi-4]         ; LEN
         test    edx, edx             ; Is length 0 ?
         jz      @@1
         cmp     byte ptr [edi], 127  ; Is first char Chr(127) ?
         jne     @@1
         inc     edi                  ; Skip Chr(127)
         dec     edx                  ; LEN--
    
    @@1: mov     ecx, eax             ; Get shorter length to ECX
         cmp     eax, edx
         jbe     @@2
         mov     ecx, edx
    
    @@2: dec     ecx
         js      @@5                  ; `JS` bypasses the loop if ECX was initially zero
         movzx   ebx, byte ptr [esi]
         inc     esi
         movzx   ebp, byte ptr [edi]
         inc     edi
         cmp     ebx, ebp
         je      @@2
    
         cmp     ebx, 'A'
         jb      @@3
         cmp     ebx, 'Z'
         ja      @@3
         or      ebx, 32              ; LCase
    @@3: cmp     ebp, 'A'
         jb      @@4
         cmp     ebp, 'Z'
         ja      @@4
         or      ebp, 32              ; LCase
    @@4: cmp     ebx, ebp
         je      @@2
         mov     eax, ebx
         mov     edx, ebp
    
    @@5: sub     eax, edx
         pop     ebp
         pop     ebx
         pop     edi
         pop     esi
    

    I'm sure the loop and especially its LCase part could be further optimized, but I think the main issue about ignoring a leading Chr(127) got resolved...


    There is a small algorithmic problem

    Your main loop (@@3) essentially has two parts: one that does an 'as-is' comparison (cmpsb) and one that does a 'case-insensitive' comparison. Thing is that the latter would make sense if it were the only part in the loop, but, combined with the first part it should only be used for current bytes that are confirmed to be an alphabetical.

    I don't know whether any of your strings will include an ASCII code in the range from 91 to 96, but if that is the case then, currently it is possible for your original code to report wrongly on a comparison between any from [a,z] and any from ]Z,a[ due to the UCase happening, and similarly for my first code to report wrongly on a comparison between any from [A,Z] and any from ]Z,a[ due to the LCase happening.

    This is a revised code:

         ...
    
    @@2: dec     ecx
         js      @@4                  ; `JS` bypasses the loop if ECX was initially zero
         movzx   ebx, byte ptr [esi]
         inc     esi
         movzx   ebp, byte ptr [edi]
         inc     edi
         ; *** part 1 (compare as-is) ***
         cmp     ebx, ebp
         je      @@2
         ; *** part 2 (compare case-insensitively) ***
         or      ebx, 32              ; LCase
         sub     ebx, 'a'
         cmp     ebx, 25
         ja      @@3                  ; It is not an alphabetical
         or      ebp, 32              ; LCase
         sub     ebp, 'a'
         cmp     ebp, 25
         ja      @@3                  ; It is not an alphabetical
         cmp     ebx, ebp
         je      @@2                  ; Only their case was different
         mov     eax, ebx
         mov     edx, ebp
         jmp     @@4
    @@3: movzx   eax, byte ptr [esi-1]
         movzx   edx, byte ptr [edi-1]
    @@4: sub     eax, edx
         pop     ebp
         pop     ebx
         pop     edi
         pop     esi
    

    Incidentally, the new code should run faster now that I have applied the technique that Peter mentioned in a comment.