sqlhivepartition-by

HIVE SQL - row_number() OVER (partition BY


Hi I'm trying to populate the row_number using row_number() over (partition, but the results is not giving me the row number 1 where the first record is cumulative_cost >= threshold and the subsequent rows all have the same row number.

This is the result

date cost_amt cumulative_cost threshold_reached_order product_id
9/07/2023 14.09 14.09 0 12345
9/07/2023 10.2 24.29 0 12345
9/07/2023 25.03 49.32 3 12345
11/07/2023 28.09 77.41 4 12345

I'm expecting this result

date cost_amt cumulative_cost threshold_reached_order product_id
9/07/2023 14.09 14.09 0 12345
9/07/2023 10.2 24.29 0 12345
9/07/2023 25.03 49.32 1 12345
11/07/2023 28.09 77.41 2 12345
select date, cost_amt, cumulative_cost,
case when cumulative_cost >= threshold then
row_number() OVER (partition BY product_name, userid ORDER BY date, order_id) 
else 0 
end AS threshold_reached_order,
product_id
from table1
where userid = 'ABCDEFG'
order by userid , product_id

Solution

  • The row_number() OVER (partition BY product_name, userid ORDER BY date, order_id) is a window function that assigns a unique number to each row within a partition, which is determined by the product_name and userid columns. The rows are ordered by date and order_id.

    The rows are already given the number and then you are putting the condition.

    Before cumulative_cost >= threshold condition:

    date cost_amt cumulative_cost threshold_reached_order product_id
    9/07/2023 14.09 14.09 1 12345
    9/07/2023 10.2 24.29 2 12345
    9/07/2023 25.03 49.32 3 12345
    11/07/2023 28.09 77.41 4 12345

    After cumulative_cost >= threshold condition:

    date cost_amt cumulative_cost threshold_reached_order product_id
    9/07/2023 14.09 14.09 0 12345
    9/07/2023 10.2 24.29 0 12345
    9/07/2023 25.03 49.32 3 12345
    11/07/2023 28.09 77.41 4 12345

    You can try using this query:

    SELECT
      date,
      cost_amt,
      cumulative_cost,
      CASE
        WHEN cumulative_cost >= threshold THEN
          SUM(CASE WHEN cumulative_cost >= threshold THEN 1 ELSE 0 END)
          OVER (PARTITION BY product_id ORDER BY date, order_id)
        ELSE 0
      END AS threshold_reached_order,
      product_id
    FROM table1
    WHERE userid = 'ABCDEFG'
    ORDER BY userid, product_id, date, order_id;