I often have a large block of HiveQL that I want to run multiple times with different settings for some variables.
A simple example would be:
set mindate='2015-01-01 00:00:00'
set maxdate='2015-04-01 00:00:00'
select * from my_table where the_date between ${hiveconf:mindate} and ${hiveconf:maxdate}
Which is then run via hive -f myfile.sql > myout.log
Later, I would like to change the variables and re-run. I also want a record of what values the variables had each time I ran.
So I currently make copies of the HiveQL file that are the same except for the variable values. This is obviously error-prone, however, because if I need to change the actual HiveQL, then I have to change it in every file.
Ideally, I could store all my settings a JSON file (or whatever) and have my HiveQL file be totally dynamic. Is there any way to do this?
Set your variables in the config file and load this file in your hql script:
source /path_to_your_config_file/config.hql;