hadoophiveparametershiveqlhive-configuration

When to set hive parameters during a session?


I'm new to my role and part of it requires creating/inserting data into both managed and external hive tables. We have a few lines of 'set' parameters that we run at the beginning of a hive session, but I've run into a few cases, where, for example, the files are merged for some partitions (few number of files), but not others (many smaller files), seemingly on random days.

My question is: when is it necessary to enter all of my Hive set parameters? Does it need to be done for every single insert/command/statement I'm running? Or just once at the beginning of the Hive session when I've launched Hive?

These are the standard set parameters we've been using:

SET mapred.job.queue.name=yometrics;
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.max.dynamic.partitions=2000;
SET hive.exec.max.dynamic.partitions.pernode=2000;
SET hive.merge.tezfiles=true;

Solution

  • You can put configuration in the beginning of the file, it will work for the whole session.

    Alternatively you can put common parameters in the separate file params.hql and in each script call

    source /local/path/to/the/file/params.hql in the beginning.

    Also you can put them in the hive-site.xml

    Also you can use bootstrap for the same if you are on Qubole/AWS: https://docs.qubole.com/en/latest/user-guide/hive/bootstrap-script.html