sqlhivehiveqltpc

Hive select query return top 100 syntax error?


Here is my Hive query, straight from the TPC-DS toolkit:

WITH customer_total_return 
     AS (SELECT sr_customer_sk AS ctr_customer_sk, 
                sr_store_sk    AS ctr_store_sk, 
                Sum(sr_fee)    AS ctr_total_return 
         FROM   store_returns, 
                date_dim 
         WHERE  sr_returned_date_sk = d_date_sk 
                AND d_year = 2000 
         GROUP  BY sr_customer_sk, 
                   sr_store_sk) 
SELECT TOP 100 c_customer_id 
FROM   customer_total_return ctr1, 
       store, 
       customer 
WHERE  ctr1.ctr_total_return > (SELECT Avg(ctr_total_return) * 1.2 
                                FROM   customer_total_return ctr2 
                                WHERE  ctr1.ctr_store_sk = ctr2.ctr_store_sk) 
       AND s_store_sk = ctr1.ctr_store_sk 
       AND s_state = 'TN' 
       AND ctr1.ctr_customer_sk = c_customer_sk 
ORDER  BY c_customer_id; 

However, I get the following error when attempting to run it:

FAILED: ParseException line 11:11 cannot recognize input near 'TOP' '100' 'c_customer_id' in selection target

My understanding is that TOP 100 is not syntactically valid in HiveQL. How can I rewrite this properly?


Solution

  • Use LIMIT instead of TOP, like this:

    WITH customer_total_return 
         AS (SELECT sr_customer_sk AS ctr_customer_sk, 
                    sr_store_sk    AS ctr_store_sk, 
                    Sum(sr_fee)    AS ctr_total_return 
             FROM   store_returns, 
                    date_dim 
             WHERE  sr_returned_date_sk = d_date_sk 
                    AND d_year = 2000 
             GROUP  BY sr_customer_sk, 
                       sr_store_sk) 
    SELECT c_customer_id 
    FROM   customer_total_return ctr1, 
           store, 
           customer 
    WHERE  ctr1.ctr_total_return > (SELECT Avg(ctr_total_return) * 1.2 
                                    FROM   customer_total_return ctr2 
                                    WHERE  ctr1.ctr_store_sk = ctr2.ctr_store_sk) 
           AND s_store_sk = ctr1.ctr_store_sk 
           AND s_state = 'TN' 
           AND ctr1.ctr_customer_sk = c_customer_sk 
    ORDER  BY c_customer_id
    LIMIT 100;