c++algorithmtime-complexityheapavl-tree

Find shortest subarrays A[0:L], B[0:L] where M different elements in A are bigger than M different elements in B (time complexity)


I need to find minimum subarray of size L, A[0:L], B[0:L] such that there are M different elements in A that are bigger than M different elements in B. Like A[i] > B[j] counts but I cannot use A[i] or B[j] again. Also subarrays must start from the beginning of array.

The homework is about AVLTrees and Heap so I guess a solution would involve one of them (For the example below, I used the priority_queue from stl but actually I am not allowed to use any container from any library so I already have min-heap implementation but for easy-understanding I used the stl equivalent). Also I am expected to solve the question using AVL Tree or Heap.

Ps. I have found out that arrays can contain duplicate elements.

The size of A and B are the same which is: N.

I need to find a solution that has time complexity O(N * logN * logM)

#include <iostream>
#include <queue>
#include <span>

int minLenOfSubarrays(std::span<int> aArr, std::span<int> bArr, int M)
{
    const int N = aArr.size();
    int count = 0;
    int L = 0;

    int left = M;
    int right = N;
    while(left <= right){ //log N
        //Heap a; //min-heap
        //Heap b; //min-heap
        int mid = left+(right-left)/2;
        std::priority_queue<int,std::vector<int>,std::greater<int>> a;
        std::priority_queue<int,std::vector<int>,std::greater<int>> b;
        count = 0;
        for(int j = 0; j < mid; j++){ //N log N
            a.push(aArr[j]);
            b.push(bArr[j]);
        }
        while(!a.empty()){ // N log N
            if(a.top() > b.top()){
                count++;
                a.pop();
                b.pop();
            }
            else{
                a.pop();
            }
        }
        if(count >= M){
            L = mid;
            right = mid-1;
        }
        else{
            left = mid+1;
        }
    }
    return L;
}

int main(int argc, char* argv[]){

    int aArr[] = {2,4,10,6,1,11};
    int bArr[] = {3,5,8,9,7,12};
    int M = 3;
    
    std::cout << minLenOfSubbarays(aArr, bArr, M) << '\n';
}

I have tried a solution using min-heaps and binary search but the complexity that I could reach is max O(N*logN*logN).


Solution

  • In your binary search, for each mid do this:

    1. Compute the M largest values in A[0:mid], in ascending order. Can be done in O(mid * logM) with an AVL tree or a heap of size M. Which is also O(N * logM).
    2. Compute the M smallest values in B[0:mid], in ascending order. Again O(N * logM).
    3. Go through them in parallel, checking whether each A-value is larger than the B-value. Takes O(M), which is also O(N).

    The binary search contributes the remaining factor logN.

    Bonus: You can make it O(N * logN) by sorting all A-values and all B-values before the binary search and then for each mid you get the above A-values and B-values of steps 1 and 2 by instead filtering those already sorted values. In "executable pseudocode":

    from bisect import bisect
    
    A = 2,4,10,6,1,11
    B = 3,5,8,9,7,12
    M = 3
    
    # Sort indices by their values
    N = len(A)
    Ai = sorted(range(N), key=A.__getitem__)
    Bi = sorted(range(N), key=B.__getitem__)
    
    # Check whether A[:L] and B[:L] have the desired property 
    def enough(L):
        largeA = [A[i] for i in Ai if i < L][-M:]
        smallB = [B[i] for i in Bi if i < L][:M]
        return all(a > b for a, b in zip(largeA, smallB))
    
    # Binary search, find the smallest L from M to N that's enough
    L = bisect(range(N), False, M, N, key=enough)
    print(L)
    

    (This assumes there always is a valid L, otherwise you would've specified what to do if not.)

    Attempt This Online!

    I think we can even achieve O(NlogM + MlogN). Still do the binary search, but don't start from scratch for each new mid. Instead just update a tree of the M largest A-values:

    Overall you have O(N) single-value updates, each costing O(logM). And you have O(logN) binary search rounds, each costing O(M) for comparing the M pairs. (This might be inspired by Costantino's answer.)