pythonbinary-searchlinear-search

Binary search vs linear search


I have the following implementation of binary search:

import time
numbers_one_to_hundred=[]
for num in range(1,100001,2): #25 66 35 64 (38,101) (1,101)
    numbers_one_to_hundred.append(num)

print('Please choose a number:')    
my_number=int(input())

print('Length array:',len(numbers_one_to_hundred))
#print('array:',numbers_one_to_hundred)
my_start_time = time.time()
start=0
end=len(numbers_one_to_hundred)
temp_arr=numbers_one_to_hundred
index_start=start
index_end=end
index_middle=int((index_end+index_start)/2)
found=False
while(not(start==0 and start==end)):
  if temp_arr[int((end+start)/2)]==my_number: #middle temp is my number
        print('you find',temp_arr[int((end+start)/2)],'index:',index_middle)
        found=True
        break
  elif my_number<temp_arr[int((end+start)/2)]: #number in first half
        end=int((start+end)/2)
        temp_arr=temp_arr[start:end] 
        start=0
        index_end=index_middle+1
        index_middle=int((index_middle+index_start)/2)
  elif my_number>temp_arr[int((end+start)/2)]: #number in second half
        start=int((end+start)/2)        
        temp_arr=temp_arr[start+1:end] 
        start=0
        end=int((end-1)/2) 
        index_start=index_middle+1
        index_middle=int((index_end+index_middle)/2)

if not(found): #number not in list
  print('Number not in list!')
  
my_end_time = time.time()

print('passed:',my_end_time-my_start_time)
  

When the input is for example 50003,this is the result:

Please choose a number:
50003
Length array: 50000
you find 50003 index: 25001
passed: 0.0028307437896728516

This is a code for linear search:

import time
numbers_one_to_hundred=[]
for num in range(1,100001,2): #25 66 35 64 (38,101) (1,101)
    numbers_one_to_hundred.append(num)

print('Please choose a number:')    
my_number=int(input())

print('Length array:',len(numbers_one_to_hundred))

my_start_time=time.time()  
if my_number in numbers_one_to_hundred:
    print('SUCCESS!')

else:
    print('NOT SUCCESS!')

my_end_time = time.time()    
print('passed:',my_end_time-my_start_time)

And result for this search is:

Please choose a number:
50003
Length array: 50000
SUCCESS!
passed: 0.0006299018859863281

My question is,why the advantage is to the linear search in this case?(and pretty big difference) Is there problem with one of the implementations? Or the input is the problem? (I tried 3-4 different input and in all of them the advantage was to the linear search) Or something else?


Solution

  • You binary search copies parts of the list in lines such as temp_arr=temp_arr[start:end]. Since this copying happens in exponential decay parts (e.g. if original list is 128, then the next copy will be 64, and the one after that 32, etc), whose total cost is O(N) over the whole run of the algorithm.

    EDIT: clarification regarding the total cost of the exponential decay mentioned above. The total cost of copying the list parts over all the iterations is N /2 + N /4 + N /8 + ... + 1, which is a geometric series, which equals N-1. Since usually the list's length is not a power of two, there will be a slight variation of the total cost, but it will still be O(N).

    Although both the inefficient implementation of "binary search" and linear search, are O(N), the constant factor is higher for the "binary search" since it uses many more operations than linear search to achieve its goal. So overall your "binary search" is slower.

    To make the code run faster, find a way to avoid creating new lists. A possible solution is to change the code such that start and end point to the original numbers_one_to_hundred list, without ever creating temp_arr

    EDIT: Clarificaiton on improvments: Without copying partial lists, the binary search should be significantly faster for any list with enough elements. The cut-off size is likely between N=10 and N=100, when I'll guess the number to be closer to 20 for correctly implemented binary search in python.