I have scenario where table0
is having one to many relation with table1
. Both can be updated separately. I needed to query recently updated rows based on the latest updated date across both tables. The query is paginated as well. So, suppose if only table0
is there, this is the query
SELECT
"table_0"."id",
"table_0"."short_name"
FROM
"table_0"
ORDER BY
"table_0"."updated_at" DESC
LIMIT
10 OFFSET 0;
table_0
id | short_name | updated_at |
---|---|---|
t0_3 | name 3 | 93 |
t0_2 | name 2 | 57 |
t0_1 | name 1 | 46 |
table_1
id | table_0_id | updated_at |
---|---|---|
t1_7 | t0_2 | 94 |
t1_6 | t0_3 | 92 |
t1_5 | t0_3 | 67 |
t1_4 | t0_1 | 61 |
t1_3 | t0_1 | 59 |
t1_2 | t0_2 | 58 |
t1_1 | t0_1 | 47 |
Expected Output:
id | short_name | updated_at |
---|---|---|
t0_2 | name 2 | 94 |
t0_3 | name 3 | 93 |
t0_1 | name 1 | 61 |
I have this query which worked, but I am not sure If this is best solution
-- For someone who wish to replicate the scenario
-- CREATE TABLE table_0(id TEXT, short_name TEXT, updatedAt INT);
-- INSERT INTO table_0 VALUES('t0_1', 'Name1', 46);
-- INSERT INTO table_0 VALUES('t0_2', 'Name2', 57);
-- INSERT INTO table_0 VALUES('t0_3', 'Name3', 93);
-- CREATE TABLE table_1(id TEXT, table_0_id TEXT, updatedAt INT);
-- INSERT INTO table_1 VALUES('t1_7', 't0_2', 94);
-- INSERT INTO table_1 VALUES('t1_6', 't0_3', 92);
-- INSERT INTO table_1 VALUES('t1_5', 't0_3', 67);
-- INSERT INTO table_1 VALUES('t1_4', 't0_1', 61);
-- INSERT INTO table_1 VALUES('t1_3', 't0_1', 59);
-- INSERT INTO table_1 VALUES('t1_2', 't0_2', 58);
-- INSERT INTO table_1 VALUES('t1_1', 't0_1', 47);
SELECT * FROM table_0;
SELECT * FROM table_1;
WITH "table_1_max" AS (
SELECT
DISTINCT "table_1"."table_0_id",
MAX("table_1"."updatedat") AS "updatedat"
FROM
"table_1"
GROUP BY
"table_1"."table_0_id"
ORDER BY
"updatedat" DESC
)
SELECT
"table_0"."id",
"table_0"."short_name",
CASE
WHEN "table_1_max"."updatedat" > "table_0"."updatedat" THEN "table_1_max"."updatedat"
ELSE "table_0"."updatedat"
END AS "updatedat"
FROM
"table_0"
INNER JOIN "table_1_max" ON "table_1_max"."table_0_id" = "table_0"."id"
ORDER BY
"updatedat" DESC,
"table_0"."short_name" ASC
LIMIT
10 OFFSET 0;
Note that if your data is big, and you have to query it frequently while writes are less frequent you could consider denormalizing your data by adding a table_0.hierarchyupdateat
column, filled by trigger on insert and update on table0
and table1
.
This trigger is all the more simple as from its name it's a "last occuring wins" rule, so you won't even have to compare the date to other rows and even to the current value of the column, before updating it.
Beware to think of the theorical case if you can remove the most recent row: then you'll have to add an on delete
trigger, to recompute the column with a traditional query like the one you did.
And of course table_0.hierarchyupdateat
can be indexed (which you can't do with the way your query fetches table_1.updateat
before comparing it to table0.updatedat
).
That said, globally your query is right. There are some bits that you can make it simpler:
GROUP BY
acts as a DISTINCT
, so you can remove the DISTINCT
GREATEST(x, y)
can replace a CASE WHEN x > y THEN x ELSE y END
SELECT t0.id … FROM table_0 t0 …
INNER JOIN
: I suppose it means there's always at least one row from table_1
for each table_0
? Else use a LEFT JOIN
Another option is to do everything in the main query by directly joining to table_1
instead of table_1_max
, GROUP
ing by every column of table_0
.
But if id
is its primary key, by grouping only onto it, PostgreSQL will understand that every other column of table_0
is grouped too.
Which would result in:
ALTER TABLE table_0 ADD PRIMARY KEY (id);
SELECT
t0."id",
t0."short_name",
GREATEST(t0.updatedat, MAX(t1.updatedat)) updatedat
FROM
table_0 t0
INNER JOIN table_1 t1 ON t1."table_0_id" = t0."id"
GROUP BY
t0.id
ORDER BY
GREATEST(t0.updatedat, MAX(t1.updatedat)) DESC,
t0."short_name" ASC
LIMIT
10 OFFSET 0;
id | short_name | updatedat |
---|---|---|
t0_2 | Name2 | 94 |
t0_3 | Name3 | 93 |
t0_1 | Name1 | 61 |
Although this query looks simpler, you should EXPLAIN ANALYZE
it to make sure it responds to your needs in terms of performance.
(and you can see it in a fiddle)