hivehive-metastore

Can we extract the queries which was ran on hive through metadata


Basically, I want metadata of queries that ran on the hive in one day. I looked into the metadata which is provided by hive in MySql. But not able to find any table which stores query related information.


Solution

  • After doing some research found that we can extract the MapReduce jobs using the History Server REST API for Hadoop.

    And From that you'll the the Job related information.

    To get the query you need request for particular job's conf

    <history_server_http_address:port>/ws/v1/history/mapreduce/jobs/<JOB_ID>/conf

    From this you'll get all configs. For query you need look hive.query.string

    https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/HistoryServerRest.html#History_Server_REST_APIs.

    We can also extract the job's JSON and configuration of that job in XML from the hdfs location. For this, you need the value of the mapreduce.jobhistory.done-dir property.

    Then you fire hdfs get command to get the data.

    hdfs dfs -get <resource-manager-path>/<year-dir>/<month-dir>/<day-dir>  <destination-local-dir>