pythonbig-ospace-complexity

Auxilary Space Complexity Across Iterations


Suppose we have the function below:

def ReverseStr(s, k):
    """
    s: list of characters (length n)
    k: integer
    """
    for i in range(0, len(s), 2*k):
        s = s[:i] + s[i:i+k][::-1] + s[i+k:]
    return s

In this function, each iteration involves creating four distinct sublists. First, we create a sub-list of length i. Then, we create a sub-list of size k; we create another sub-list of size k when we reverse this sub-list of size k. Finally, we create a sub-list of size n - (i + k). So, in total, we create sub-lists with a total space of i + k + k + n - (i + k) or, simplified, n + k.

Space complexity is defined as the maximum amount of space that a function will occupy at any given point. While this definition makes sense to me, I am struggling to understand what happens on successive iterations. For instance, suppose we are on the fourth iteration. Are the sub-lists that were created on the first, second, and third iterations still being stored in memory somewhere? If so, then our space complexity analysis must factor in memory accumulation over successive iterations. If not, then we know that the maximum space consumed at any given point occurs during a single, arbitrary iteration; O(n+k).


Solution

  • each iteration involves creating four distinct sublists.

    In fact, there are also the lists that are created by executing the + operator (list concatenation)

    So, in total, we create sub-lists with a total space of i + k + k + n - (i + k) or, simplified, n + k.

    When determining auxiliary space complexity, it is common practice to discard the memory that can be garbage collected, even though in practice the garbage collector may not free the memory immediately.

    We start out with the memory for s, which has n elements. But this does not belong to auxiliary space, so we can discard it. Also the memory for the list that will be returned at the end could be excluded from what we call auxiliary memory (it depends how you define it).

    The order of execution is like this:

     s[:i]
    

    This allocates memory for a list of length i

     s[i:i+k]
    

    This allocates memory for a list of length k

    ...[::-1]
    

    This allocates memory for another list of length k, after this allocation, the original list (of the previous step) is no longer referenced, so it can be garbage collected (after a peak of 2k). We could therefore conclude this is a break-even operation that does not increase the memory used by referenced objects.

    ... + ...
    

    The first two terms are concatenated, which creates a list of length i + k. But also here, the operands of this concatenation are no longer referenced, so after a temporary peak in terms of memory this is rather a break-even operation (ignoring the overhead for a list -- we went from two lists to just one).

    s[i+k:]
    

    This allocates memory for a list of length n - i - k. So we are at a total auxiliary memory of n now -- in terms of number of referenced objects by these lists, again ignoring the overhead for having 2 active lists.

    ... + ...
    

    This is the second concatenation. Once allocated, we have a peak of 2n, and then the memory for the operands will become garbage collectable, and only the resulting list of n elements remains usable.

    Then finally the assignment to s kicks in, which disposes of the original value of s (except when it was the original version that the caller has -- in the first iteration), reducing the "active" memory to n object references, which arguably stop being "auxiliary", as they now are referenced by s which eventually will be the returned list.

    So the peak is at 2n object references of auxiliary memory, excluding the lists overhead for at most 2 lists.

    The auxiliary memory complexity is thus O(2𝑛) = O(𝑛).

    Improvement

    You could reduce the auxiliary memory complexity to O(𝑘) as follows:

    def ReverseStr(s, k):
        for i in range(0, len(s), 2*k):
            s[i:i+k] = s[i+k-1:i-1 if i else None:-1]
        return s
    

    Here s is modified in place. The reversed slice is created in one operation, requiring O(𝑘) auxiliary memory, and once the assignment is completed this auxiliary memory is discarded again.

    If it is required that the caller's list is not mutated, then just start by making a copy which will distinguish the returned list from the input list (both not counted as auxiliary memory):

    s = s[:]
    

    You can bring down the auxiliary memory usage even more by copying values one by one, not in slices:

    def ReverseStr(s, k):
        for i in range(0, len(s), 2*k):
            k = min(k, len(s) - i)
            for j in range(k//2):
                s[i+k-1-j], s[i+j] = s[i+j], s[i+k-1-j]  # swap 
        return s
    

    This has an auxiliary memory complexity of O(1). But this is only interesting in theory: it will run slower than the above implementation because it performs the inner iteration in Python code instead of the slicing performed by compiled code.