I have a table with the following columns:
city_id
= city IDcity_name
= city nameshape
= polygon -> geography fielddate
= date when the shape was createdand populated with the following sample data:
city_id | city_name | shape | created_date |
---|---|---|---|
1 | abcd | POLYGON((18.967481 103.314121, 18.947563293673372 103.3669285559756, 18.90735564458858 103.32701845345733, 18.962686563318833 103.30510874799613, 18.967481 103.314121)) | 2023-04-24 |
1 | abcd | POLYGON((12.967481 103.314121, 12.947563293673372 103.3669285559756, 12.90735564458858 103.32701845345733, 12.962686563318833 103.30510874799613, 12.967481 103.314121)) | 2023-04-24 |
1 | abcd | POLYGON((14.967481 103.314121, 14.947563293673372 103.3669285559756, 14.90735564458858 103.32701845345733, 14.962686563318833 103.30510874799613, 14.967481 103.314121)) | 2023-04-24 |
2 | efgh | POLYGON((16.967481 103.314121, 16.947563293673372 103.3669285559756, 16.90735564458858 103.32701845345733, 16.962686563318833 103.30510874799613, 16.967481 103.314121)) | 2023-04-11 |
2 | efgh | POLYGON((8.967481 103.314121, 8.947563293673372 103.3669285559756, 8.90735564458858 103.32701845345733, 8.962686563318833 103.30510874799613, 8.967481 103.314121)) | 2023-04-11 |
2 | efgh | POLYGON((22.967481 103.314121, 22.947563293673372 103.3669285559756, 22.90735564458858 103.32701845345733, 22.962686563318833 103.30510874799613, 22.967481 103.314121)) | 2023-03-23 |
3 | ijkl | POLYGON((24.967481 103.314121, 24.947563293673372 103.3669285559756, 24.90735564458858 103.32701845345733, 24.962686563318833 103.30510874799613, 24.967481 103.314121)) | 2023-04-10 |
3 | ijkl | POLYGON((28.967481 103.314121, 28.947563293673372 103.3669285559756, 28.90735564458858 103.32701845345733, 28.962686563318833 103.30510874799613, 28.967481 103.314121)) | 2023-03-03 |
I've crafted the following query:
SELECT city_id,
city_name,
shape,
date
FROM (SELECT city_id,
city_name,
ANY_VALUE(shape) AS shape,
date,
ROW_NUMBER() OVER (PARTITION BY city_id, city_name ORDER BY date DESC) AS rank
FROM test
GROUP BY 1,2,4,5)
WHERE rank = 1
ORDER BY date DESC
Current result:
city_id | city_name | shape | created_date |
---|---|---|---|
1 | abcd | POLYGON((18.967481 103.314121, 18.947563293673372 103.3669285559756, 18.90735564458858 103.32701845345733, 18.962686563318833 103.30510874799613, 18.967481 103.314121)) | 2023-04-24 |
2 | efgh | POLYGON((16.967481 103.314121, 16.947563293673372 103.3669285559756, 16.90735564458858 103.32701845345733, 16.962686563318833 103.30510874799613, 16.967481 103.314121)) | 2023-04-11 |
3 | ijkl | POLYGON((24.967481 103.314121, 24.947563293673372 103.3669285559756, 24.90735564458858 103.32701845345733, 24.962686563318833 103.30510874799613, 24.967481 103.314121)) | 2023-04-10 |
With this query, I was hoping that using the PARTITION
function and rank = 1
would return all the records for a city where several shapes have been created the same day, for the latest date. The query works fine for cities with only 1 shape created on the latest date.
Expected output:
city_id | city_name | shape | created_date |
---|---|---|---|
1 | abcd | POLYGON((18.967481 103.314121, 18.947563293673372 103.3669285559756, 18.90735564458858 103.32701845345733, 18.962686563318833 103.30510874799613, 18.967481 103.314121)) | 2023-04-24 |
1 | abcd | POLYGON((12.967481 103.314121, 12.947563293673372 103.3669285559756, 12.90735564458858 103.32701845345733, 12.962686563318833 103.30510874799613, 12.967481 103.314121)) | 2023-04-24 |
1 | abcd | POLYGON((14.967481 103.314121, 14.947563293673372 103.3669285559756, 14.90735564458858 103.32701845345733, 14.962686563318833 103.30510874799613, 14.967481 103.314121)) | 2023-04-24 |
2 | efgh | POLYGON((16.967481 103.314121, 16.947563293673372 103.3669285559756, 16.90735564458858 103.32701845345733, 16.962686563318833 103.30510874799613, 16.967481 103.314121)) | 2023-04-11 |
2 | efgh | POLYGON((8.967481 103.314121, 8.947563293673372 103.3669285559756, 8.90735564458858 103.32701845345733, 8.962686563318833 103.30510874799613, 8.967481 103.314121)) | 2023-04-11 |
3 | ijkl | POLYGON((24.967481 103.314121, 24.947563293673372 103.3669285559756, 24.90735564458858 103.32701845345733, 24.962686563318833 103.30510874799613, 24.967481 103.314121)) | 2023-04-10 |
Any idea on how to return all the shapes created on the latest date for each city?
Thank you in advance!
The DENSE_RANK
window function is better used here. It's cousin ROW_NUMBER
allows unique ranking values only, regardless of ties.
SELECT city_id, city_name, shape, date
FROM (SELECT city_id, city_name, shape, date,
DENSE_RANK() OVER (PARTITION BY city_id ORDER BY date DESC) AS rn
FROM test) cte
WHERE rn = 1
ORDER BY date DESC
Also you can remove:
GROUP BY
clause from the subqueryPARTITION BY
clause, assuming that your "city_id" values uniquely identify cities.