Consider the following table source
id: LONG
name: STRING
ornum: LONG
The goal is to create a new table target
which for each distinct id in source
selects the row with largest ornum
.
The following works fine for reasonable number of distinct id
s:
CREATE OR REPLACE TABLE target AS SELECT DISTINCT ON(id) * FROM test ORDER BY ornum DESC;
Is there a way to handle large number (e.g 1B) distinct values with limited memory (say 4GB)?
I'm not sure how your SQL solves the problem given it has no condition on ornum
- you may just be getting lucky.
In DuckDB, you can use the arg_max
function just as @jqurious suggested. This will scale with the number of id
values.
The traditional SQL windowing version from @Barmar will have to materialise a sorted version of the entire data set and then pull out only one row. You can do this in DuckDB:
SELECT columns(*)
FROM test
WINDOW w AS (PARTITION BY id ORDER BY ornum DESC)
QUALIFY row_number() over w = 1
but it will be a lot slower and consume a lot more RAM/Disk space.