Can we use DISTINCT and RANK() function together in a select statement for distinct values? Lets take 2 scenarios 1.Using RANK() and DISTINCT 2.Using only RANK() . In these both scenarios will we get same distinct values?
SELECT * FROM (
SELECT distinct
col1,
col2,
col3,
col4,
row_number () over (partition by col1, col2 ORDER BY col3) RN
FROM table)
WHERE rn = 1;
SELECT * FROM (
SELECT
col1,
col2,
col3,
col4,
row_number () over (partition by col1, col2 ORDER BY col3) RN
FROM table)
WHERE rn = 1;
The windowing function (ROW_NUMBER
here) is computed after any GROUP BY
but before DISTINCT
. Here's what an execution plan looks like for a similar SQL using a random object I selected on one of my databases:
Order of execution (read from most-indented upwards toward least-indented) is: (1) scan the table, (2) compute ROW_NUMBER (WINDOW)
, (3) apply DISTINCT (HASH UNIQUE)
.
This means that DISTINCT
will not impact the result of ROW_NUMBER
in the same query block. It could change the results of the outer query block if placed there, but in your case it won't because of your rn = 1
predicate. That will select only one row per (col1,col2)
. There's nothing left for DISTINCT
to do, though it will have to pass through another (unnecessary) sort operation to honor the request.
Word of wisdom: avoid frequent use of DISTINCT
. I see developers overusing it all the time. In most cases when you want to reduce granularity, GROUP BY
is a better choice. Usually when I see DISTINCT
in developers' code it's an attempt to hide duplicate rows caused by a many-to-many problem in their query, rather than fixing the many-to-many itself. And even if it's not hiding a problem, it is confusing to others when it's there for no purpose.