sqldatabasegoogle-bigquerystitch

Query append only table; distinct inner join issue


I have the following SQL that attempts to grab the latest version of every row in a results table, i.e a singular row that has a unique confirmation_number, the max seq and max batch, this is using append-only replication (docs). However it is faulty returning no results, the inner join works as expected but not the outter. Expected result

confirmation_number |  date | seq     | batch
23742902            |  date | max(seq)| max(batch)
  SELECT DISTINCT r.*
    FROM `results_table` r
    INNER JOIN (
      SELECT confirmation_number,
      date,
      MAX(_sdc_sequence) AS seq,
      MAX(_sdc_batched_at) AS batch
      FROM `results_table`
      GROUP BY confirmation_number, date) rr
    ON r.confirmation_number = rr.confirmation_number 
    AND r.date = rr.date
    AND r._sdc_sequence = rr.seq
    AND r._sdc_batched_at = rr.batch

This is the table I want to query against, and my sql is returning no result set. It should return me a singular row that contains a distinct confirmation_number, max seq and max batch

enter image description here


Solution

  • Below is for BigQuery Standard SQL

    I am quite guessing here about your expected result - but below looks to me should give you expected

    #standardSQL
    SELECT AS VALUE ARRAY_AGG(r ORDER BY seq DESC LIMIT 1)[OFFSET(0)]
    FROM `project.dataset.results_table` r
    GROUP BY confirmation_number
    

    As you can see here - it returns one row per confirmation_number - the row which has max seq value for that confirmation_number with all respective values (date, batch, etc.)

    Not sure if you need to group also by date (as it is in the query in your question - but in my mind it is not needed) - but if you do need - it is easy to add