c++functionbinary-searchstdarray

What is the difference between [start/2 + mid/2] and [(start + mid)/2] in binary search?


In the binary search algorithm, we set the mid as:

mid = (start + end)/2 which is same as

mid = start/2 + end/2 and also equal to

mid = start + (end - start)/2

but all of the three give different results being the same arithmetic expression. How do these vary during the computation process?

This was the code to find the last occurrence of an element in a vector array using binary search:

int lastOccurrence(vector<int> arr, int size, int key){
    int start = 0, end = size - 1;
    int last = -1;
    // int mid = start/2 + end/2;
    int mid;
    while(start <= end){
        // mid = start + (end - start)/2;
        mid = (start + end)/2;
        if(key == arr[mid]){
            last = mid;
            start = mid + 1;
        }
        else if(key > arr[mid]){
            start = mid + 1;
        }
        else{
            end = mid - 1;
        }
        cout << "start: " << start << "\tend: " << end << "\tmid: " << mid << endl;
    }
    return last;
}

The values being passed to the function are:

int main(){
    vector<int> arr = {1,2,3,4,4,4,4,5,6,7,11};
    int size = arr.size();
    int key = 4;
    cout << "First occurrence of " << key << " is at index " << firstOccurrence(arr, size, key) << endl;

    cout << "Last occurrence of " << key << " is at index " << lastOccurrence(arr, size, key) << endl;

    return 0;
}

If the mid element equals the required "key" element then the mid index is stored in a variable and start is updated to mid + 1 so that it can search the right part of the array for any other occurrence of the "key". If the "key" is found to be less than the mid element it implies that the element is not present beyond the mid element and the end is updated to mid - 1 to search in the left part of the array, and similarly to search the right part if "key" is found to be greater than the mid element.

It gave different results when mid = start/2 + end/2 was used and mid = (start + end)/2 was used. How is this affected during the computation process?


Solution

  • You need to consider that integer arithmetic cuts off any fractional parts, hence depending on the last bit of start and stop you get different results.

    Suppose they are

     start = M*2 + a;
     end = N*2 + b;
    

    Where M and N are integers and a and b are either 1 or 0, then you get

    mid_0 = (start + end)/2 = M+N + (a+b) / 2
    mid_1 = start/2 + end/2 = M+N
    mid = start + (end - start)/2 = M*2 + a + (N-M) + (b-a)/2 = M+N + a + (b-a)/2 
    

    Only the second expression does not depend on whether start or end are even or odd. I didn't actually bother to work out (via a 2x2 table) whether a + (b-a)/2 yields the same result as (a+b)/2. However, when dealing with integer arithmetic you better do not rely on intuition, because it's too easy to be off by one (or more). Moreover, I did not yet consider integer overflow. When start+end overflows then (start/2) + (end/2) does not.