So we have a group of people hitting our cluster and would like to monitor every SQL statement being run via hive/odbc. The job history server web page will give me part of the SQL but not everything. Is there a way to retrieve the full SQL of commands being run? Could be via web GUI or command line. I want to monitor for inefficient queries and send out warnings to developers.
Cluster is 10 node MapR cluster running Mapr 5.0 in yarn mode
I found the solution. SSH into the machine running hiveserver2 and execute the command below. Replace with the user you are interested in (or * if you want them all) and replace hive-0.13 with the version of hive you are using.
cat /opt/mapr/hive/hive-0.13/logs/<user>/hive.log | grep "Starting command"
EDIT: in the year since I wrote this is appears it has changed to:
cat /opt/mapr/hive/hive-2.1/logs/<user>/hive.log | grep "Executing command"
EDIT Take 2. The above doesn't deal with multi-line sql and shows only the first line. For many of the queries it was simply showing "SELECT" and nothing more. The below solves the issue. I had a nice sed solution but it didn't work with tail -f
for some reason, so Perl it is. Unusual for Perl but it's actually more readable than the sed solution.
cat /opt/mapr/hive/hive-2.3/logs/<user>/hive.log | perl -ne 's/\r\n/ /g; print;' | grep "Executing command"