sqlhiveqliksensehive-query

FETCH the latest partition from HIVE table


Hi I am very much new to this. I have three columns YEAR, MONTH,DAY in INTEGER format.

I want to load the script and combine YEAR,MONTH,DAY as single column and fetch the maximum.

I tried like,

Load year,month,date from HIVE.`abc`.`abc1';
SELECT max(cast(year as String) || '_' || cast(month as string) || '_' || cast(day as string)) as result FROM HIVE.`abc`.`abc1';

By doing this I will get the result as 2020_5_21. But I should use the separator and find the max of the date.

The following error occurred: Connector reply error: SQL##f - SqlState: S1000, ErrorCode: 35, ErrorMsg: [Cloudera][Hardy] (35) Error from server: error code: '1' error message: 'Error while compiling statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask'.

I want to use the result in WHERE clause. But I don't know the statement. SQL select * from HIVE.abc.`abc1' where ---- ;

Please help.


Solution

  • If month and day are stored as integers, you need to use lpad() to add zero if it is single digit month or day. For example month 5 should become 05. Without this max may work incorrectly. Also use dash as a separator to have date in compatible format.

    max(concat(year,'-',lpad(month, 2,0),'-',lpad(day, 2,0)))
    

    And to use it in the WHERE use WHERE date in (select max ...):

    SELECT * from your_table
    WHERE concat(year,'-',lpad(month, 2,0),'-',lpad(day, 2,0)) in (select max(concat(year,'-',lpad(month, 2,0),'-',lpad(day, 2,0))) from your_table)
    

    Also you may need to quote names like year, month, day in backticks everywhere in sql:

    max(concat(`year`,'-',lpad(`month`, 2,0),'-',lpad(`day`, 2,0)))