sqlgoogle-bigquerypartition-by

Return all the results for the latest date


I have a table with the following columns:

and populated with the following sample data:

city_id city_name shape created_date
1 abcd POLYGON((18.967481 103.314121, 18.947563293673372 103.3669285559756, 18.90735564458858 103.32701845345733, 18.962686563318833 103.30510874799613, 18.967481 103.314121)) 2023-04-24
1 abcd POLYGON((12.967481 103.314121, 12.947563293673372 103.3669285559756, 12.90735564458858 103.32701845345733, 12.962686563318833 103.30510874799613, 12.967481 103.314121)) 2023-04-24
1 abcd POLYGON((14.967481 103.314121, 14.947563293673372 103.3669285559756, 14.90735564458858 103.32701845345733, 14.962686563318833 103.30510874799613, 14.967481 103.314121)) 2023-04-24
2 efgh POLYGON((16.967481 103.314121, 16.947563293673372 103.3669285559756, 16.90735564458858 103.32701845345733, 16.962686563318833 103.30510874799613, 16.967481 103.314121)) 2023-04-11
2 efgh POLYGON((8.967481 103.314121, 8.947563293673372 103.3669285559756, 8.90735564458858 103.32701845345733, 8.962686563318833 103.30510874799613, 8.967481 103.314121)) 2023-04-11
2 efgh POLYGON((22.967481 103.314121, 22.947563293673372 103.3669285559756, 22.90735564458858 103.32701845345733, 22.962686563318833 103.30510874799613, 22.967481 103.314121)) 2023-03-23
3 ijkl POLYGON((24.967481 103.314121, 24.947563293673372 103.3669285559756, 24.90735564458858 103.32701845345733, 24.962686563318833 103.30510874799613, 24.967481 103.314121)) 2023-04-10
3 ijkl POLYGON((28.967481 103.314121, 28.947563293673372 103.3669285559756, 28.90735564458858 103.32701845345733, 28.962686563318833 103.30510874799613, 28.967481 103.314121)) 2023-03-03

I've crafted the following query:

SELECT city_id, 
       city_name, 
       shape,
       date
FROM (SELECT city_id, 
             city_name, 
             ANY_VALUE(shape) AS shape, 
             date, 
             ROW_NUMBER() OVER (PARTITION BY city_id, city_name ORDER BY date DESC) AS rank
      FROM test
      GROUP BY 1,2,4,5)
WHERE rank = 1
ORDER BY date DESC

Current result:

city_id city_name shape created_date
1 abcd POLYGON((18.967481 103.314121, 18.947563293673372 103.3669285559756, 18.90735564458858 103.32701845345733, 18.962686563318833 103.30510874799613, 18.967481 103.314121)) 2023-04-24
2 efgh POLYGON((16.967481 103.314121, 16.947563293673372 103.3669285559756, 16.90735564458858 103.32701845345733, 16.962686563318833 103.30510874799613, 16.967481 103.314121)) 2023-04-11
3 ijkl POLYGON((24.967481 103.314121, 24.947563293673372 103.3669285559756, 24.90735564458858 103.32701845345733, 24.962686563318833 103.30510874799613, 24.967481 103.314121)) 2023-04-10

With this query, I was hoping that using the PARTITION function and rank = 1 would return all the records for a city where several shapes have been created the same day, for the latest date. The query works fine for cities with only 1 shape created on the latest date.

Expected output:

city_id city_name shape created_date
1 abcd POLYGON((18.967481 103.314121, 18.947563293673372 103.3669285559756, 18.90735564458858 103.32701845345733, 18.962686563318833 103.30510874799613, 18.967481 103.314121)) 2023-04-24
1 abcd POLYGON((12.967481 103.314121, 12.947563293673372 103.3669285559756, 12.90735564458858 103.32701845345733, 12.962686563318833 103.30510874799613, 12.967481 103.314121)) 2023-04-24
1 abcd POLYGON((14.967481 103.314121, 14.947563293673372 103.3669285559756, 14.90735564458858 103.32701845345733, 14.962686563318833 103.30510874799613, 14.967481 103.314121)) 2023-04-24
2 efgh POLYGON((16.967481 103.314121, 16.947563293673372 103.3669285559756, 16.90735564458858 103.32701845345733, 16.962686563318833 103.30510874799613, 16.967481 103.314121)) 2023-04-11
2 efgh POLYGON((8.967481 103.314121, 8.947563293673372 103.3669285559756, 8.90735564458858 103.32701845345733, 8.962686563318833 103.30510874799613, 8.967481 103.314121)) 2023-04-11
3 ijkl POLYGON((24.967481 103.314121, 24.947563293673372 103.3669285559756, 24.90735564458858 103.32701845345733, 24.962686563318833 103.30510874799613, 24.967481 103.314121)) 2023-04-10

Any idea on how to return all the shapes created on the latest date for each city?

Thank you in advance!


Solution

  • The DENSE_RANK window function is better used here. It's cousin ROW_NUMBER allows unique ranking values only, regardless of ties.

    SELECT city_id, city_name, shape, date
    FROM (SELECT city_id, city_name, shape, date, 
                 DENSE_RANK() OVER (PARTITION BY city_id ORDER BY date DESC) AS rn
          FROM test) cte
    WHERE rn = 1
    ORDER BY date DESC
    

    Also you can remove: