sqlhivemaxcalculated-columnsimpala

Getting single MAX() row of Calculated Column within a Specific Time Interval in SQL


I am trying to get the MAX() of the sums of amps_a + amps_b + amps_c within the month of March for the ID column. This should return a single row for the highest sum for the month of March (or at least all the rows that equaled the highest sum if there are multiples).

I have tried the following:

SELECT 
    a.id, a.volts_a, a.volts_b, a.volts_c, 
    a.amps_a, a.amps_b, a.amps_c, 
    amps_a + amps_b + amps_c AS sum1, 
    a.phase_a_voltage_angle, a.phase_a_current_angle, 
    a.phase_b_voltage_angle, a.phase_b_current_angle, 
    a.phase_c_voltage_angle, a.phase_c_current_angle, 
    a.time_stamp, b.max1
FROM
    Table1 a
INNER JOIN 
    (SELECT id, MAX(sum1) max1, time_stamp
     FROM Table1
     WHERE id = "abcxyz"
       AND time_stamp = 202503
     GROUP BY id, time_stamp) b ON a.id = b.id 
                                AND a.amps_a = b.amps_a 
                                AND a.time_stamp = b.time_stamp

When I do this I get the following error:

AnalysisException: Could not resolve column/field reference: 'sum1'

I am able to get the MAX() function to work without the summing:

SELECT 
    a.id, a.volts_a, a.volts_b, a.volts_c, 
    a.amps_a, a.amps_b, a.amps_c, 
    a.phase_a_voltage_angle, a.phase_a_current_angle, 
    a.phase_b_voltage_angle, a.phase_b_current_angle, 
    a.phase_c_voltage_angle, a.phase_c_current_angle, a.time_stamp
FROM
    Table1 a
INNER JOIN 
    (SELECT id, MAX(amps_a) amps_a, time_stamp
     FROM Table1
     WHERE id = "abcxyz"
       AND time_stamp = 202503
     GROUP BY id, time_stamp) b ON a.id = b.id 
                                AND a.amps_a = b.amps_a 
                                AND a.time_stamp = b.time_stamp

This returns the 2 highest values in march (both 3.1 amps).

I have tried a few other alternative methods, but this is the closest I have gotten. What am I missing?


Solution

  • You cannot refer to a.sum1 from b, which is computed apart before being joined to a.

    Depending on your ultimate goal, you will have to:

    Write your max(amps_a + amps_b + amps_c) within b

    If your goal is to select only the values corresponding to the highest amp, just compute all columns of b independently of a as well as the final query:

    SELECT
      a.id, a.volts_a, a.volts_b, a.volts_c, a.amps_a, a.amps_b, a.amps_c, amps_a + amps_b + amps_c as sum1, a.phase_a_voltage_angle, a.phase_a_current_angle, a.phase_b_voltage_angle, a.phase_b_current_angle, a.phase_c_voltage_angle, a.phase_c_current_angle, a.time_stamp
    FROM Table1 a
    INNER JOIN (
      SELECT id, time_stamp, MAX(amps_a + amps_b + amps_c) max1
      FROM Table1
      GROUP BY id, time_stamp
    ) b ON a.id = b.id and a.time_stamp = b.time_stamp and a.amps_a + a.amps_b + a.amps_c = max1;
    

    Use a window function to put the max next to each line

    If you want to display the highest value of the month for reference next to each row for the month,
    you'll want to use a window function:

    SELECT
      a.id, a.volts_a, a.volts_b, a.volts_c, a.amps_a, a.amps_b, a.amps_c, amps_a + amps_b + amps_c as sum1, a.phase_a_voltage_angle, a.phase_a_current_angle, a.phase_b_voltage_angle, a.phase_b_current_angle, a.phase_c_voltage_angle, a.phase_c_current_angle, a.time_stamp,
      MAX(amps_a + amps_b + amps_c) OVER (PARTITION BY id, time_stamp) max1
    FROM Table1 a;
    

    Tie break

    If you want only 1 row returned even when multiple rows reach the month's max,
    you can transform either of above queries with a row_number() window function:

    SELECT * FROM
    (
        SELECT
          -- Tag each result row with its order within the group (same id, same time_stamp):
          row_number() OVER
          (
              PARTITION BY a.id, a.time_stamp
              ORDER BY /* Here enter your choice criteria between two entries with same id and time_stamp; for now use a decoy sort: */ a.time_stamp DESC
          ) AS group_pos,
          -- The rest is the above query, unmodified:
          a.id, a.volts_a, …
          …
          … and a.amps_a + a.amps_b + a.amps_c = max1
    ) a_with_pos
    WHERE group_pos = 1; -- Then filter to keep only the first row from each group.
    

    row_number() guaranties there will be only 1 row with number 1 within the group (same id, same time_stamp).

    Play with it

    And here is a small demo of the kind of return you will get.