I find that my Impala swarm performs not stable, normally it takes only a few seconds (less than 10s) to finish a query, but occasionally it will take more than 40s (and this situation will last for a few minutes), and when that happens, accroding to the profile, TotalRawHdfsOpenFileTime is very high, which implies most of the time is spend on opening HDFS file.
So what is the possible reason and how can I solve it?
This is time spent opening files. If you're querying HDFS, this often means that it's spending time fetching data from the namenode.
We saw dramatic improvements in a lot of production deployments running into this bottleneck by enabling file handle caching - https://docs.cloudera.com/documentation/enterprise/5-15-x/topics/impala_scalability.html#scalability_file_handle_cache