pythonpython-3.xlistmatplotlibtime

Trying to time the "in" operator


I have a question about how the in operator works internally. So to test it I tried to graph the time it takes to find a member of a list, in different parts of the list.

import time
import matplotlib.pyplot as plt

# list size
size = 100000
# number of points
n = 100000

# display for not having to count zeros
if size >= 1000000:
    new_size = str(int(size/1000000))+"M"
else:
    new_size = str(int(size/1000))+"K"

if n >= 1000000:
    new_n = str(int(n/10000000))+"M"
elif n >= 1000:
    new_n = str(int(n/1000))+"K"
else:
    new_n = n

lst = list(range(size))
result = []
for number in range(0,size+1,int(size/n)):
    start_time = time.time_ns()
    if number in lst:
        end_time = time.time_ns()
        total_time = (end_time-start_time)/1000000000 #convert ns to seconds
        result.append([total_time,number])
        print(number,total_time)

x_values, y_values = zip(*result)
plt.scatter(x_values, y_values, c='red', marker='o',s=5)
plt.xlabel('Time (sec)')
plt.ylabel('Number')
plt.title(f'List length: {new_size}\nNumber of points: {new_n}\n\nTime to find number in list')
plt.grid(True)
plt.show()

From what I saw, in works internally by calling iter which sequentially takes all elements of the iterable, starting from the first. So I would expect that searching further in a list takes more and more time.
The list has to be above a certain length so that python takes a long enough time to find the number that we can measure, but fiddling around with different list sizes and number of points I found out that the points cluster around vertical lines on multiples of 0.001s, which is very bizzare to me. I could see some horizontal lines forming due to how python works internally, but not vertical ones. I even used time.time_ns() to have even more time precision, but it still happens.
I mean, how can it take the same time to find 539 and 94,598 inside a [0,1, ... 100000] list if it always starts from 0?

100,000 points


Solution

  • This test seems to suggest to me that searching for a number in a list takes longer and longer based on where the item falls in the list.

    import timeit
    
    setup = """
    size = 1_000_000
    size_minus_one = size - 1
    data = list(range(size))
    """
    
    print(timeit.timeit("0 in data", setup=setup, number=1_000))
    print(timeit.timeit("size_minus_one in data", setup=setup, number=1_000))
    

    I assume some people are not satisfied with that answer as the vertical bars are not "explained". Let's do that as well.

    The vertical bars are an artifact of the resolution of the time.time_ns(). Using time.perf_counter_ns() as recommended by @no-comment gives us the following result confirming that the time to find an item in a list is directly proportional to the location of the item in the list.

    this result

    Via this code:

    import time
    import matplotlib.pyplot as plt
    
    # list size
    size = 100000
    # number of points
    n = 100000
    
    # display for not having to count zeros
    if size >= 1000000:
        new_size = str(int(size/1000000))+"M"
    else:
        new_size = str(int(size/1000))+"K"
    
    if n >= 1000000:
        new_n = str(int(n/10000000))+"M"
    elif n >= 1000:
        new_n = str(int(n/1000))+"K"
    else:
        new_n = n
    
    clock = time.perf_counter_ns
    lst = list(range(size))
    result = []
    for number in range(0,size+1, int(size/n)):
        start_time = clock()
        if number in lst:
            result.append([clock() - start_time, number])
    
    x_values, y_values = zip(*result)
    plt.scatter(x_values, y_values, c='red', marker='o',s=5)
    plt.xlabel('Time')
    plt.ylabel('Number')
    plt.title(f'List length: {new_size}\nNumber of points: {new_n}\n\nTime to find number in list')
    plt.grid(True)
    plt.show()