arraysalgorithmdata-structuresdivide-and-conquermaster-theorem

How to find the time comlexity when comparing sublists?




def check_lists(list_1, list_2):

    if list_1 == None:
        return None
    if list_2 == None:
        return None

    comparison_result = True

    for i in range(len(list_1)):
        if list_1[i] != list_2[i]:
            comparison_result = False
            break

    if comparison_result:
        return list_1
    else:
        return None

def check_sublists(lst, fir, la):
    mid = (la + fir) // 2

    if (la - fir == 0):
        single_sublist = lst[fir]
        return single_sublist
    
    if (la - fir == 1):
        first_list = lst[fir]
        last_list = lst[la]
        return check_lists(first_list, last_list)

    first_half_result = check_sublists(lst, fir, mid)
    second_half_result = check_sublists(lst, mid + 1, la)
    final_result = check_lists(first_half_result, second_half_result)
    
    return final_result


lst = [[1, 2, 3], [1, 2, 3], [1, 2, 3] ]

result = check_sublists(lst, 0, len(lst) - 1)

print(result)


So the function check_lists compares two lists using a for loop so it is O(k), in this case it will be used to compare the sublists with each other. The point is to check if all sublists are the same. The function check_sublists is the function that actually compares its sublists recursively. By using master theorem I get a=b=2. Also the largest sublist size is given by k. And nested list size is n. I am supposed to describe time complexity with these 2 variables. But using master theorem I am stuck because f(n) depends on k. I mean the non recursive part is k + constants. How am I supposed to do this?


Solution

  • The worst case materialises when all elements of the top-level list are equal, as then the None condition in the check_lists is never true.

    Reasoning

    The base case of recursion hits when the sublist only has 1 or 2 elements. In the latter case check_lists compares two adjacent elements of the top-level list.

    The recursive case also calls check_lists, with one element taken from the left partition, and one element taken from the right partition. It doesn't really matter which element from those partitions gets selected/returned: when we get an element back, it means they were all the same in that partition anyway, so we can imagine that it would have been the rightmost element from the left partition and the leftmost element from the right partition.

    If we account for the elements in that way, we can see that all adjacent elements of the top-level list are subject to a call of check_lists, and exactly once. In a list of 𝑛 elements there are 𝑛−1 adjacent pairs, so we can conclude that exactly 𝑛−1 calls of check_lists are made. As each call has a worst case time complexity of O(𝑘), the overall time complexity is O(𝑛𝑘).

    Formal approach

    Before getting into the proof, I suggest to remove the handling of the base case where la - fir == 1 from the code, because the recursive case will deal with this correctly: the two recursive calls will in that case both hit the other base case (la - fir == 0) and so accomplish the same result.

    So we then have this recursive function:

    def check_sublists(lst, fir, la):
        if la == fir:
            return lst[fir]
        mid = (la + fir) // 2
        return check_lists(check_sublists(lst, fir, mid), check_sublists(lst, mid + 1, la))
    

    In the case la and fir differ by 1, this program will still make one more recursive call than in your original code. But as the execution of such an extra call has O(1) complexity (as it hits the base case), this does not impact the overall time complexity. But it makes it easier to reason about the recursion tree.

    The recursion tree (whose nodes are the distinct calls of check_sublists) is now a full binary tree:

    We thus have this recurrence relation, where 𝑇(𝑖) represents the time complexity for a list slice if size 𝑖, i.e. where 𝑖 is la - fir + 1:

          𝑇(1) = O(1)

          𝑇(𝑛) = 2𝑇(𝑛/2) + O(𝑘)

    Using the substitution method, while assuming that 𝑛 is a power of 2, we can expand with one step like this:

          = 2[2𝑇(𝑛/4) + O(𝑘)] + O(𝑘)

    If we expand 𝑖−1 times, we get:

          = 2𝑖𝑇(𝑛/2𝑖) + (20+21+22+...+2𝑖−1)O(𝑘)

    If we expand until we reach 𝑇(1), i.e. when 𝑖=log2𝑛, we get:

          = 2log2𝑛𝑇(1) + (20+21+22+23+...+2log2𝑛−1)O(𝑘)

    Now 2log2𝑛 is just 𝑛, and the sum of powers of 2 is twice the last term minus 1, so we get:

          = 𝑛O(1) + (2(𝑛−1)−1)O(𝑘)

          = O(𝑛) + O(𝑛)O(𝑘)

          = O(𝑛𝑘)