[SOLVED] Is it right to use DISTINCT and RANK() function together for distinct values in a select statement?

Is it right to use DISTINCT and RANK() function together for distinct values in a select statement?

Can we use DISTINCT and RANK() function together in a select statement for distinct values? Lets take 2 scenarios 1.Using RANK() and DISTINCT 2.Using only RANK() . In these both scenarios will we get same distinct values?

SELECT * FROM (
SELECT distinct
 col1,
 col2,
 col3,
 col4,
 row_number () over (partition by col1, col2 ORDER BY col3) RN
FROM table)
WHERE rn = 1;


SELECT * FROM (
SELECT
 col1,
 col2,
 col3,
 col4,
 row_number () over (partition by col1, col2 ORDER BY col3) RN
FROM table) 
WHERE rn = 1;

Solution

The windowing function (ROW_NUMBER here) is computed after any GROUP BY but before DISTINCT. Here's what an execution plan looks like for a similar SQL using a random object I selected on one of my databases:

Order of execution (read from most-indented upwards toward least-indented) is: (1) scan the table, (2) compute ROW_NUMBER (WINDOW), (3) apply DISTINCT (HASH UNIQUE).

This means that DISTINCT will not impact the result of ROW_NUMBER in the same query block. It could change the results of the outer query block if placed there, but in your case it won't because of your rn = 1 predicate. That will select only one row per (col1,col2). There's nothing left for DISTINCT to do, though it will have to pass through another (unnecessary) sort operation to honor the request.

Word of wisdom: avoid frequent use of DISTINCT. I see developers overusing it all the time. In most cases when you want to reduce granularity, GROUP BY is a better choice. Usually when I see DISTINCT in developers' code it's an attempt to hide duplicate rows caused by a many-to-many problem in their query, rather than fixing the many-to-many itself. And even if it's not hiding a problem, it is confusing to others when it's there for no purpose.