javaarraysalgorithmsegment-tree

Is there a way to quickly find in a range contains a number that is in a given range?


So here is a problem, I am given an integer array, whose number is all distinct, let's say it is

int[] data = {21, 34, 12, 88, 54, 73};

now that I would like to see if a subarray, or a range, contains a number is in a range(which is also given). In other words, I want to see if a range of the array contains a number that is in a range. For instance, if I have a function check(int a, int b, int l, int r) where a and b is the range of the array and l and r is the range of the number.

So for the array above, check(0, 2, 20, 50) should return true since from index = 0 to 2, there is 21, 34, 12 and there is two numbers,21, 34, is in range of 20 to 50.

So another example would be check(2, 3, 20, 80) should return false since there,12, 88, is no number in range of 20, 80.

I'm thinking about using Segment Tree, since as I know, RMQ(range minimum query) can be solved by using Segment Tree, thus I think Segment Tree would also work on this problem; however, all of the "get" function of Segment Tree is "single"(Perhaps not the best word), so, I would want to know what nodes should the Segment Tree hold. Is there any algorithm that can answer each query in O(log(n)) while the "build" time is not O(n^2), where n is the size of the array?

Note: Using Segment Tree is just my own thought, any other approach is appreciated.


Solution

  • It's a bit exotic, but a persistent red-black tree, or a persistent variant of any other self-balancing tree, would work.

    A persistent data structure allows one to (time- and space-)efficiently take "snapshots" of the structure at different times, and then query those snapshots later, receiving results based on the structure's state as of the snapshot time. For this use case, the particular query we would want to do would be to count all the contained elements within a given range (which can be performed in O(log n) if each node is annotated with the number of its descendants).

    In this case, you would start with an empty structure, and at time i, insert data[i] and then store a snapshot as snapshot[i]. Then, check(a,b,l,r) would be implemented as return snapshot[b].countInRange(l,r) > snapshot[a].countInRange(l,r). That is, if there were more elements in the target range as of time b than there were as of time a, then some element in the target range must have been added between a and b and thus satisfies your constraints.

    If optimally implemented, the precomputation would take time O(n log n) and space O(n), and queries would take time O(log n).


    If you were willing to relax the O(log n) requirement for queries, a simpler and potentially more practical approach would be a 2-dimensional k-D tree. Simply insert each data[i] as the point (i, data[i]), and then do a range search for a<=x<b, l<=y<r. This gives you a query time of O(sqrt(n)), which is not as efficient, but a lot easier to code up (or to find existing code for).