javac++operatorsmicro-optimizationpremature-optimization

x > -1 vs x >= 0, is there a performance difference


I have heard a teacher drop this once, and it has been bugging me ever since. Let's say we want to check if the integer x is bigger than or equal to 0. There are two ways to check this:

if (x > -1){
    //do stuff
}

and

if (x >= 0){
    //do stuff
} 

According to this teacher > would be slightly faster then >=. In this case it was Java, but according to him this also applied for C, c++ and other languages. Is there any truth to this statement?


Solution

  • There's no difference in any real-world sense.

    Let's take a look at some code generated by various compilers for various targets.

    And here's what each of them produced for the comparison operations:

    MSVC 11 targeting ARM:

    // if (x > -1) {...
    00000        |cmp_gt| PROC
      00000 f1b0 3fff    cmp         r0,#0xFFFFFFFF
      00004 dd05         ble         |$LN2@cmp_gt|
    
    
    // if (x >= 0) {...
      00024      |cmp_gte| PROC
      00024 2800         cmp         r0,#0
      00026 db05         blt         |$LN2@cmp_gte|
    

    MSVC 11 targeting x64:

    // if (x > -1) {...
    cmp_gt  PROC
      00000 83 f9 ff     cmp     ecx, -1
      00003 48 8d 0d 00 00                  // speculative load of argument to my_puts()
        00 00        lea     rcx, OFFSET FLAT:$SG1359
      0000a 7f 07        jg  SHORT $LN5@cmp_gt
    
    // if (x >= 0) {...
    cmp_gte PROC
      00000 85 c9        test    ecx, ecx
      00002 48 8d 0d 00 00                  // speculative load of argument to my_puts()
        00 00        lea     rcx, OFFSET FLAT:$SG1367
      00009 79 07        jns     SHORT $LN5@cmp_gte
    

    MSVC 11 targeting x86:

    // if (x > -1) {...
    _cmp_gt PROC
      00000 83 7c 24 04 ff   cmp     DWORD PTR _x$[esp-4], -1
      00005 7e 0d        jle     SHORT $LN2@cmp_gt
    
    
    // if (x >= 0) {...
    _cmp_gte PROC
      00000 83 7c 24 04 00   cmp     DWORD PTR _x$[esp-4], 0
      00005 7c 0d        jl  SHORT $LN2@cmp_gte
    

    GCC 4.6.1 targeting x64

    // if (x > -1) {...
    cmp_gt:
        .seh_endprologue
        test    ecx, ecx
        js  .L2
    
    // if (x >= 0) {...
    cmp_gte:
        .seh_endprologue
        test    ecx, ecx
        js  .L5
    

    GCC 4.6.1 targeting x86:

    // if (x > -1) {...
    _cmp_gt:
        mov eax, DWORD PTR [esp+4]
        test    eax, eax
        js  L2
    
    // if (x >= 0) {...
    _cmp_gte:
        mov edx, DWORD PTR [esp+4]
        test    edx, edx
        js  L5
    

    GCC 4.4.1 targeting ARM:

    // if (x > -1) {...
    cmp_gt:
        .fnstart
    .LFB0:
        cmp r0, #0
        blt .L8
    
    // if (x >= 0) {...
    cmp_gte:
        .fnstart
    .LFB1:
        cmp r0, #0
        blt .L2
    

    IAR 5.20 targeting an ARM Cortex-M3:

    // if (x > -1) {...
    cmp_gt:
    80B5 PUSH     {R7,LR}
    .... LDR.N    R1,??DataTable1  ;; `?<Constant "non-negative">`
    0028 CMP      R0,#+0
    01D4 BMI.N    ??cmp_gt_0
    
    // if (x >= 0) {...
    cmp_gte:
     80B5 PUSH     {R7,LR}
     .... LDR.N    R1,??DataTable1  ;; `?<Constant "non-negative">`
     0028 CMP      R0,#+0
     01D4 BMI.N    ??cmp_gte_0
    

    If you're still with me, here are the differences of any note between evaluating (x > -1) and (x >= 0) that show up:

    Note that GCC and IAR generated identical machine code for the two kinds of comparison (with the possible exception of which register was used). So according to this survey, it appears that (x >= 0) has an ever-so-slight chance of being 'faster'. But whatever advantage the minimally shorter opcode byte encoding might have (and I stress might have) will be certainly completely overshadowed by other factors.

    I'd be surprised if you found anything different for the jitted output of Java or C#. I doubt you'd find any difference of note even for a very small target like an 8 bit AVR.

    In short, don't worry about this micro-optimization. I think my write up here has already spent more time than will be spent by any difference in the performance of these expressions accumulated across all the CPUs executing them in my lifetime. If you have the capability to measure the difference in performance, please apply your efforts to something more important like studying the behavior of sub-atomic particles or something.