My understanding as per standard practice is that HAVING is to be used along with GROUP BY for filtering conditions, while WHERE is supposed to be used for general row-wise filtering conditions.
However, there are online discussions with mixed conclusions as to whether use HAVING as a superset of WHERE clause. That is, whether it can be used even without GROUP BY in which case it works as a WHERE clause.
I want to understand what is the industry practice in using HAVING clause across Oracle, Microsoft SQL server, MySQL, PostGreSQL and other tools.
A funny thing I observed when executing this query:
SELECT *
FROM SH.SALES
WHERE amount_sold > 1000
HAVING amount_sold < 2000;
It gives an error when executing in Oracle SQL developer desktop whereas runs successfully in Oracle SQL developer web.
This is a great question AND puzzle!
Oracle SQL Developer Web is provided via Oracle REST Data Services (ORDS). There is a RESTful Web Service used to execute 'ad hoc' SQL statements and scripts.
Instead of bringing back all the rows from a query in a single call, we page them. And instead of holding a resultset open and process running, we stick to the RESTful way, and do all the work on a single call and response.
How do we make this happen?
Well, when you type in that query from your question and execute it, on the back end, that's not actually what gets executed.
We wrap that query with another SELECT, and use the ROW_NUMBER() OVER analytic function call. This allows us to 'window' the query results, in this case between rows 1 and 26, or the the first 25 rows of that query, your query.
SELECT *
FROM (
SELECT Q_.*,
ROW_NUMBER() OVER(
ORDER BY 1
) RN___
FROM (
select *
from sh.sales
where amount_sold > 1000
having amount_sold < 2000
) Q_
)
WHERE RN___ BETWEEN :1 AND :2
Ok, but so what?
Well, Optimizer figures out this query can still run, even if the having clause isn't appropriate.
The optimizer is always free to re-arrange a query before searching for best execution plans.
In this case, a 10053 trace shows that a query such as below that came from SQL Dev Web (I'm using EMP but the same applies for any table)
SELECT *
FROM (
SELECT Q_.*,
ROW_NUMBER() OVER(
ORDER BY 1
) RN___
FROM (
SELECT *
FROM emp
WHERE sal > 1000
HAVING sal < 2000
) Q_
)
WHERE RN___ BETWEEN :1 AND :2
got internally transformed to the following before being optimized for plans.
SELECT
subq.EMPNO EMPNO,
subq.ENAME ENAME,
subq.JOB JOB,
subq.MGR MGR,
subq.HIREDATE HIREDATE,
subq.SAL SAL,subq.COMM COMM,
subq.DEPTNO DEPTNO,
subq.RN___ RN___
FROM
(SELECT
EMP.EMPNO EMPNO,
EMP.ENAME ENAME,
EMP.JOB JOB,EMP.MGR MGR,
EMP.HIREDATE HIREDATE,
EMP.SAL SAL,
EMP.COMM COMM,
EMP.DEPTNO DEPTNO,
ROW_NUMBER() OVER ( ORDER BY NULL ) RN___
FROM EMP EMP
WHERE EMP.SAL>1000 AND TO_NUMBER(:B1)>=TO_NUMBER(:B2)
) subq
WHERE subq.RN___>=TO_NUMBER(:B3)
AND subq.RN___<=TO_NUMBER(:B4)
Notice the HAVING has been transformed/optimized out of the query, which lets it pass through onto the execution phase.
Major 👏 to @connor-mcdonald of AskTom fame for helping me parse this out.
And so that's why it works in SQL Developer Web, but NOT in SQL Developer Desktop, where the query is executed exactly as written.