In a distributed bare-metal Apache Drill, complex concurrent queries have two issue:
hooking the cluster resources, especially CPU, this can be somehow controlled by Linux "cgroup".
the Drill seems to be serving concurrent queries as first-come-first-served, this means - even if the second query is very simple and it should not take time, it will have to wait for the first-coming complex heavy query to be finished first, which is not acceptable at all in a production environment.
my question is: is there a workaround to resolve the second problem, if not, what are the alternatives from technology stack that might help in this case?
we tried changing some Apache Drill configuration parameters related to concurrent queries and queue management.
Without query queueing enabled Drill takes the approach of unlimited concurrent execution (an approach that will soon exhaust the cluster's resources if new queries arrive rapidly enough). With queueing enabled, concurrency is capped at a configured number of queries, where "small" queries are queued separately from "big" queries. In either case, I'd never expect to find that a big query is holding back the execution of a small query. The only scenario I can image is that both queries are being classified as the same size (both big, or both small) and you have reached the concurrency limit for the respective queue so that the second query stays queued.
It might be useful discuss the issue further in the Apache Drill Slack