I have a primary key composed of three columns (id_grandparent, id_parent, id_row) which is residing in KUDU.
I want my lookups to be fast (hbase-like) when looking by id_grandparent. I'm using Impala and Spark to do lookups, let's assume both of them do the predicate pushdown on equality.
I have some questions which I can't tell 100% sure by reading the docs
SELECT * FROM my_table where id_grandparent = 55
Will this query be able to use the index-order even If i'm not providing the whole primary key? (aka return mega-fast). I'm assuming yes because I guess primary key is sorted by the first column and it's some kind of prefix-scan
SELECT * FROM my_table where id_parent = 55
Will this query be able to use any kind of optimization? Or any non-first column (if first-column not provided) will force a full scan on all the tablets.
I've read about this here: https://kudu.apache.org/2018/09/26/index-skip-scan-optimization-in-kudu.html but I'm not sure if that has been released or not
Thanks beforehand!
According to the this JIRA ticket it is still pending.
According to this documentation (latest at the time of this answer)
Scans on multilevel partitioned tables can take advantage of partition pruning on any of the levels independently.
However I doubt is index skip scan is implemented yet, considering the blog post was written only a few months ago.
Update According to reply from dev@kudu.apache.org
Unfortunately the original author's internship ended last summer and nobody has taken the time to complete the work. It would definitely speed up certain types of queries. There are some concerns that in its current state it could cause a performance regression for some queries though. It could probably benefit from improvements to the heuristics it uses to decide when to enable the skip scan optimization.