arraysalgorithmsortinglinked-list

O(log n) algorithm to find best insert position in sorted array


I'm trying to make an algorithm that finds the best position to insert the target into the already sorted array.

The goal is to either return the position of the item if it exists in the list, else return the position it would go into to keep the list sorted.

So say I have a list:

   0   1   2   3   4    5    6
 ---------------------------------
 | 1 | 2 | 4 | 9 | 10 | 39 | 100 |
 ---------------------------------

And my target item is 14 It should return an index position of 5

Pseudo-code I currently have:

array = generateSomeArrayOfOrderedNumbers()

number findBestIndex(target, start, end)
    mid = abs(end - start) / 2

    if (mid < 2) 
        // Not really sure what to put here
        return start + 1 // ??

    if (target < array[mid])
        // The target belongs on the left side of our list //
        return findBestIndex(target, start, mid - 1)
    else
        // The target belongs on the right side of our list //
        return findBestIndex(target, mid + 1, end)

I not really sure what to put at this point. I tried to take a binary search approach to this, but this is the best I could come up with after 5 rewrites or so.


Solution

  • There's several problems with your code:

    mid = abs(end - start) / 2
    

    This is not the middle between start and end, it's half the distance between them (rounded down to an integer). Later you use it like it was indeed a valid index:

    findBestIndex(target, start, mid - 1)
    

    Which it is not. You probably meant to use mid = (start + end) // 2 or something here. You also miss a few indices because you skip over the mid:

    return findBestIndex(target, start, mid - 1)
     ...
    return findBestIndex(target, mid + 1, end)
    

    Your base case must now be expressed a bit differently as well. A good candidate is the condition

    if start == end
    

    Because now you definitely know you're finished searching. Note that you also should consider the case where all the array elements are smaller than target, so you need to insert it at the end.

    I don't often search binary, but if I do, this is how

    Binary search is something that is surprisingly hard to get right if you've never done it before. I usually use the following pattern if I do a binary search:

    lo, hi = 0, n // [lo, hi] is the search range, but hi will never be inspected.
    while lo < hi:
        mid = (lo + hi) // 2
        if check(mid): hi = mid
        else:          lo = mid + 1
    

    Under the condition that check is a monotone binary predicate (it is always false up to some point and true from that point on), after this loop, lo == hi will be the first number in the range [0..n] with check(lo) == true. check(n) is implicitely assumed to be true (that's part of the magic of this approach).

    So what is a monotone predicate that is true for all indices including and after our target position and false for all positions before?

    If we think about it, we want to find the first number in the array that is larger than our target, so we just plug that in and we're good to go:

    lo, hi = 0, n
    while lo < hi:
        mid = (lo + hi) // 2
        if (a[mid] > target): hi = mid
        else:                 lo = mid + 1
    return lo;