[SOLVED] Why does my binary search need an extra comparison? log2(N)+1

Why does my binary search need an extra comparison? log2(N)+1

I want to find the index of the first integer in an array of integers which is <= key. I can do it with binary search in log2(N)+1 compares. Shouldn't it be possible with only log2(N) compares?

// Returns the index of the first integer in keys <= key. size must be a power of 2.
unsigned find(int key, int* keys, unsigned size) {    
    int* origKeys = keys;

    for (int i = 0; i < log2(size); i++) {
        size /= 2;

        if (keys[size] < key)
            keys += size;
    }

    unsigned i = unsigned(keys - origKeys);
    // Not only is this step messy, because it doesn't fit the loop, but
    // now we have log2(n) + 1 comparisons.
    if (*keys < key)
        i++;

    return i;
}

Solution

Let's think about this from an information-theoretic perspective. If you have an array with n elements in it, there are n+1 possible spots where the new element can go: just before any element of the array, or after all the elements. Therefore, your algorithm needs to make enough comparisons to be able to uniquely identify which of the n+1 spots is the right one. If you don't make enough comparisons, the answer you give back won't always be right.

Every comparison you make can, in the best case, eliminate half of the possible positions. Therefore, in the theoretical limit, with k comparisons you can decide which of 2^k positions is right. Since there are n+1 possible positions, you need to make lg (n+1) comparisons in the worst case, rather than lg n. Since your n is a perfect power of two, this means that one extra comparison is required. On the other hand, if you have n be one less than a perfect power of two, then making ceil(lg n) comparisons should suffice.

Edited by Eloff, this code seems to give the right answer with log2(n+1) steps as you predicted:

// Returns the index of the first integer in keys <= key. size must be one less than a power of 2.
unsigned find(int key, int* keys, unsigned size) {    
    int* origKeys = keys;
    size++;

    while(size > 1) {
        size /= 2;

        if (keys[size-1] < key)
            keys += size;
    }

    return unsigned(keys - origKeys);        
}