I'm working on an interesting question about answering range h-index queries.
Fyi h-index is the max integer h that an author has published at least h papers that each are cited at least h times.
You are given N papers and the i-th paper has been cited p[i] times. There will be Q questions, each giving a range [l, r] and asks what would be the h-index of an author that only publishes the papers from index l to r inclusive. N, Q, and all p[i] are between 1 and 2*10^5.
Here is a example input
4 2
5 7 4 1
2 3
3 4
The output should be
2
1
Here p=[5, 7, 4, 1], a query on range [2, 3] should give 2 because the 2 papers in that range have each been cited at least twice (7 and 4 times specifically). Query on [3, 4] should give 1.
Doing binary search on the subarray of each query is O(QNlogN). This code is my attempt at solving the problem. Unfortunately, it gives the right answers but it exceeds the 3 second time limit. The time limit is for the execution of the entire program, so this includes the time to read input. I did also try with scanf
and cin.tie(0);ios::sync_with_stdio(0);
to speed up i/o but that still exceeds the time limit.
#include <iostream>
int p[200005];
int main() {
using namespace std;
int n;
cin >> n;
int q;
cin >> q;
for (int i = 0; i < n; i++) {
cin >> p[i + 1];
}
for (int i = 0; i < q; i++) {
int l;
cin >> l;
int r;
cin >> r;
int lo = 1, hi = n;
while (lo <= hi) {
int mid = (lo + hi) / 2;
int cnt = 0;
for (int j = l; j <= r; j++) {
if (p[j] >= mid) {
cnt++;
}
}
if (cnt >= mid) {
lo = mid + 1;
} else {
hi = mid - 1;
}
}
cout << hi << '\n';
}
}
My other idea was using a segment tree but idk how to do the merging of two segment tree nodes faster than linear time. The problem here is still mainly the time complexity, I'm looking for a faster algorithm and based on the constraints I believe the complexity per query should definitely be better than O(N).
I'm stumped, what is a faster way to solve this within the time limit?
I can do it in O(log2 N) per query. I keep thinking there must be a way to do O(log N) per query, but I haven't figured it out yet.
Here's the idea: build a segment tree where each node holds the sorted list of values that appear in its interval, where each such value v
is accompanied by
v
;v
, and same for the right node.This is a data structure of total size O(N log N).
Now, given a query interval we can find its h-index using binary search: for each potential value h, we will check if there are at least h values in the interval that are at least h, and depending on the answer to that subproblem we'll discard half of the possible values of the h-index. To find the answer to this subproblem, we look up h in the root list and walk down the tree to the appropriate nodes as in any other segment tree query. When we descend from a node to its child, we use the index values to quickly jump from the smallest value that is greater than or equal to h in the parent node to the smallest value that is greater than or equal to h in the child node. (This is an example of fractional cascading.) Summing the corresponding frequencies across the basic intervals whose union is the query interval gives the number of values in the query interval that are at least h.
The binary search for each query has O(log N) iterations, and each iteration takes O(log N) time because it involves walking down the tree to the basic intervals, which visits at most two nodes per level of the tree.