I have a query that produces 529,032 rows, which I store into a table called EFOT_DISTINCT_COMBOS
.
(This used to be a materialized view, but our server administrators recommended that we use plain tables instead, and manually "refresh" the data from queries, using stored procedures.)
What I need to do with it next, is a CROSS JOIN
, which is encountering a major problem ... the server locks up and just hangs, without doing any work ... (library cache lock?)
So it looks like I need to replace the CROSS JOIN
in my query, and find a better way to write this query.
This query looks like a nightmare, but it's actually MUCH simpler than it looks - it just calculates some simple conditional totals.
(Months ago I had tried using window functions for this, but the window functions broke something (I don't remember what exactly) so I had to revert back to simple aggregates for some reason.)
The table we are CROSS JOIN
ing, is called SNAPSHOTS
, and it is a very short table - only containing 44 rows (it won't grow much bigger, maybe maximum 80-100 rows eventually).
529,032 x 44 = 23,277,408 rows produced by this CROSS JOIN
.
I tried to reduce this footprint, by adding HAVING COUNT(ef.TIMESTAMP) > 0
at the bottom, which removes probably 75% of the results (which I don't necessarily need).
This definitely helps, but the query still fails to execute on the server.
SELECT
ss.TIMESTAMP,
ss.TIMESTAMP_DATE,
fc.FISCAL_QUARTER,
fc.LOB,
fc.FREQUENCY,
fc.EDGE_VP,
fc.EDGE_RM,
fc.EDGE_ASM,
NVL(SUM(ef.PIPELINE),0) AS PIPELINE,
NVL(SUM(ef.BEST),0) AS BEST,
NVL(SUM(ef.FORECAST),0) AS FORECAST,
NVL(SUM(ef.CLOSED),0) AS CLOSED,
CASE fc.FREQUENCY WHEN '%' THEN NVL(SUM(ef.PIPELINE_DIFF),0)
WHEN 'D' THEN NVL(SUM(ef.D_PIPELINE_DIFF),0)
WHEN 'W' THEN NVL(SUM(ef.W_PIPELINE_DIFF),0)
WHEN 'M' THEN NVL(SUM(ef.M_PIPELINE_DIFF),0)
WHEN 'Q' THEN NVL(SUM(ef.Q_PIPELINE_DIFF),0)
WHEN 'Y' THEN NVL(SUM(ef.Y_PIPELINE_DIFF),0) END AS PIPELINE_CHANGE,
CASE fc.FREQUENCY WHEN '%' THEN NVL(SUM(ef.BEST_DIFF),0)
WHEN 'D' THEN NVL(SUM(ef.D_BEST_DIFF),0)
WHEN 'W' THEN NVL(SUM(ef.W_BEST_DIFF),0)
WHEN 'M' THEN NVL(SUM(ef.M_BEST_DIFF),0)
WHEN 'Q' THEN NVL(SUM(ef.Q_BEST_DIFF),0)
WHEN 'Y' THEN NVL(SUM(ef.Y_BEST_DIFF),0) END AS BEST_CHANGE,
CASE fc.FREQUENCY WHEN '%' THEN NVL(SUM(ef.FORECAST_DIFF),0)
WHEN 'D' THEN NVL(SUM(ef.D_FORECAST_DIFF),0)
WHEN 'W' THEN NVL(SUM(ef.W_FORECAST_DIFF),0)
WHEN 'M' THEN NVL(SUM(ef.M_FORECAST_DIFF),0)
WHEN 'Q' THEN NVL(SUM(ef.Q_FORECAST_DIFF),0)
WHEN 'Y' THEN NVL(SUM(ef.Y_FORECAST_DIFF),0) END AS FORECAST_CHANGE,
CASE fc.FREQUENCY WHEN '%' THEN NVL(SUM(ef.CLOSED_DIFF),0)
WHEN 'D' THEN NVL(SUM(ef.D_CLOSED_DIFF),0)
WHEN 'W' THEN NVL(SUM(ef.W_CLOSED_DIFF),0)
WHEN 'M' THEN NVL(SUM(ef.M_CLOSED_DIFF),0)
WHEN 'Q' THEN NVL(SUM(ef.Q_CLOSED_DIFF),0)
WHEN 'Y' THEN NVL(SUM(ef.Y_CLOSED_DIFF),0) END AS CLOSED_CHANGE,
CASE WHEN CASE fc.FREQUENCY WHEN '%' THEN NVL(SUM(ef.PREV_PIPELINE),0)
WHEN 'D' THEN NVL(SUM(ef.PREV_D_PIPELINE),0)
WHEN 'W' THEN NVL(SUM(ef.PREV_W_PIPELINE),0)
WHEN 'M' THEN NVL(SUM(ef.PREV_M_PIPELINE),0)
WHEN 'Q' THEN NVL(SUM(ef.PREV_Q_PIPELINE),0)
WHEN 'Y' THEN NVL(SUM(ef.PREV_Y_PIPELINE),0) END = 0 THEN 0
ELSE 100 * CASE fc.FREQUENCY WHEN '%' THEN NVL(SUM(ef.PIPELINE_DIFF),0)
WHEN 'D' THEN NVL(SUM(ef.D_PIPELINE_DIFF),0)
WHEN 'W' THEN NVL(SUM(ef.W_PIPELINE_DIFF),0)
WHEN 'M' THEN NVL(SUM(ef.M_PIPELINE_DIFF),0)
WHEN 'Q' THEN NVL(SUM(ef.Q_PIPELINE_DIFF),0)
WHEN 'Y' THEN NVL(SUM(ef.Y_PIPELINE_DIFF),0) END
/ CASE fc.FREQUENCY WHEN '%' THEN NVL(SUM(ef.PREV_PIPELINE),0)
WHEN 'D' THEN NVL(SUM(ef.PREV_D_PIPELINE),0)
WHEN 'W' THEN NVL(SUM(ef.PREV_W_PIPELINE),0)
WHEN 'M' THEN NVL(SUM(ef.PREV_M_PIPELINE),0)
WHEN 'Q' THEN NVL(SUM(ef.PREV_Q_PIPELINE),0)
WHEN 'Y' THEN NVL(SUM(ef.PREV_Y_PIPELINE),0) END END AS PIPELINE_PERCENT_CHANGE,
CASE WHEN CASE fc.FREQUENCY WHEN '%' THEN NVL(SUM(ef.PREV_BEST),0)
WHEN 'D' THEN NVL(SUM(ef.PREV_D_BEST),0)
WHEN 'W' THEN NVL(SUM(ef.PREV_W_BEST),0)
WHEN 'M' THEN NVL(SUM(ef.PREV_M_BEST),0)
WHEN 'Q' THEN NVL(SUM(ef.PREV_Q_BEST),0)
WHEN 'Y' THEN NVL(SUM(ef.PREV_Y_BEST),0) END = 0 THEN 0
ELSE 100 * CASE fc.FREQUENCY WHEN '%' THEN NVL(SUM(ef.BEST_DIFF),0)
WHEN 'D' THEN NVL(SUM(ef.D_BEST_DIFF),0)
WHEN 'W' THEN NVL(SUM(ef.W_BEST_DIFF),0)
WHEN 'M' THEN NVL(SUM(ef.M_BEST_DIFF),0)
WHEN 'Q' THEN NVL(SUM(ef.Q_BEST_DIFF),0)
WHEN 'Y' THEN NVL(SUM(ef.Y_BEST_DIFF),0) END
/ CASE fc.FREQUENCY WHEN '%' THEN NVL(SUM(ef.PREV_BEST),0)
WHEN 'D' THEN NVL(SUM(ef.PREV_D_BEST),0)
WHEN 'W' THEN NVL(SUM(ef.PREV_W_BEST),0)
WHEN 'M' THEN NVL(SUM(ef.PREV_M_BEST),0)
WHEN 'Q' THEN NVL(SUM(ef.PREV_Q_BEST),0)
WHEN 'Y' THEN NVL(SUM(ef.PREV_Y_BEST),0) END END AS BEST_PERCENT_CHANGE,
CASE WHEN CASE fc.FREQUENCY WHEN '%' THEN NVL(SUM(ef.PREV_FORECAST),0)
WHEN 'D' THEN NVL(SUM(ef.PREV_D_FORECAST),0)
WHEN 'W' THEN NVL(SUM(ef.PREV_W_FORECAST),0)
WHEN 'M' THEN NVL(SUM(ef.PREV_M_FORECAST),0)
WHEN 'Q' THEN NVL(SUM(ef.PREV_Q_FORECAST),0)
WHEN 'Y' THEN NVL(SUM(ef.PREV_Y_FORECAST),0) END = 0 THEN 0
ELSE 100 * CASE fc.FREQUENCY WHEN '%' THEN NVL(SUM(ef.FORECAST_DIFF),0)
WHEN 'D' THEN NVL(SUM(ef.D_FORECAST_DIFF),0)
WHEN 'W' THEN NVL(SUM(ef.W_FORECAST_DIFF),0)
WHEN 'M' THEN NVL(SUM(ef.M_FORECAST_DIFF),0)
WHEN 'Q' THEN NVL(SUM(ef.Q_FORECAST_DIFF),0)
WHEN 'Y' THEN NVL(SUM(ef.Y_FORECAST_DIFF),0) END
/ CASE fc.FREQUENCY WHEN '%' THEN NVL(SUM(ef.PREV_FORECAST),0)
WHEN 'D' THEN NVL(SUM(ef.PREV_D_FORECAST),0)
WHEN 'W' THEN NVL(SUM(ef.PREV_W_FORECAST),0)
WHEN 'M' THEN NVL(SUM(ef.PREV_M_FORECAST),0)
WHEN 'Q' THEN NVL(SUM(ef.PREV_Q_FORECAST),0)
WHEN 'Y' THEN NVL(SUM(ef.PREV_Y_FORECAST),0) END END AS FORECAST_PERCENT_CHANGE,
CASE WHEN CASE fc.FREQUENCY WHEN '%' THEN NVL(SUM(ef.PREV_CLOSED),0)
WHEN 'D' THEN NVL(SUM(ef.PREV_D_CLOSED),0)
WHEN 'W' THEN NVL(SUM(ef.PREV_W_CLOSED),0)
WHEN 'M' THEN NVL(SUM(ef.PREV_M_CLOSED),0)
WHEN 'Q' THEN NVL(SUM(ef.PREV_Q_CLOSED),0)
WHEN 'Y' THEN NVL(SUM(ef.PREV_Y_CLOSED),0) END = 0 THEN 0
ELSE 100 * CASE fc.FREQUENCY WHEN '%' THEN NVL(SUM(ef.CLOSED_DIFF),0)
WHEN 'D' THEN NVL(SUM(ef.D_CLOSED_DIFF),0)
WHEN 'W' THEN NVL(SUM(ef.W_CLOSED_DIFF),0)
WHEN 'M' THEN NVL(SUM(ef.M_CLOSED_DIFF),0)
WHEN 'Q' THEN NVL(SUM(ef.Q_CLOSED_DIFF),0)
WHEN 'Y' THEN NVL(SUM(ef.Y_CLOSED_DIFF),0) END
/ CASE fc.FREQUENCY WHEN '%' THEN NVL(SUM(ef.PREV_CLOSED),0)
WHEN 'D' THEN NVL(SUM(ef.PREV_D_CLOSED),0)
WHEN 'W' THEN NVL(SUM(ef.PREV_W_CLOSED),0)
WHEN 'M' THEN NVL(SUM(ef.PREV_M_CLOSED),0)
WHEN 'Q' THEN NVL(SUM(ef.PREV_Q_CLOSED),0)
WHEN 'Y' THEN NVL(SUM(ef.PREV_Y_CLOSED),0) END END AS CLOSED_PERCENT_CHANGE -- ,
-- COUNT(ef.TIMESTAMP) AS DEAL_SS_COUNT
FROM EFOT_DISTINCT_COMBOS fc
CROSS JOIN SNAPSHOTS ss
LEFT JOIN EDGE_FORECAST_OVER_TIME ef
ON ef.TIMESTAMP = ss.TIMESTAMP
AND ef.fq_selection = NVL(fc.FISCAL_QUARTER,'%')
AND ef.lob_selection = NVL(fc.LOB,'%')
AND ((fc.FREQUENCY = 'D' AND ef.FREQUENCY in('D', 'W', 'M', 'Q', 'Y') AND ef.TIMESTAMP_DATE >= TRUNC(LOCALTIMESTAMP) - INTERVAL '1' MONTH)
OR (fc.FREQUENCY = 'W' AND TO_CHAR(ef.TIMESTAMP, 'fmDAY') = 'MONDAY' AND ef.TIMESTAMP_DATE >= TRUNC(LOCALTIMESTAMP) - INTERVAL '3' MONTH)
OR (fc.FREQUENCY = 'M' AND ef.FREQUENCY in('M', 'Q', 'Y') AND ef.TIMESTAMP_DATE >= TRUNC(LOCALTIMESTAMP) - INTERVAL '1' YEAR)
OR (fc.FREQUENCY = 'Q' AND ef.FREQUENCY in('Q', 'Y'))
OR ef.FREQUENCY LIKE NVL(fc.FREQUENCY,'%'))
AND ef.EDGE_VP LIKE NVL(fc.EDGE_VP,'%')
AND ef.EDGE_RM LIKE NVL(fc.EDGE_RM,'%')
AND ef.EDGE_ASM LIKE NVL(fc.EDGE_ASM,'%')
WHERE ss.TABLE_NAME = 'EDGE_FORECAST'
GROUP BY ss.TIMESTAMP ,
ss.TIMESTAMP_DATE ,
fc.FISCAL_QUARTER ,
fc.LOB ,
fc.FREQUENCY ,
fc.EDGE_VP ,
fc.EDGE_RM ,
fc.EDGE_ASM
HAVING COUNT(ef.TIMESTAMP) > 0;
Next, tried splitting this query into two queries.
CROSS JOIN
and create the 23,277,408 rows with 0's for the total columns.UPDATE
all 23,277,408 rows and compute the values for the total columns.The 1st query worked great, executed in about 12 minutes.
But the 2nd query became even more complicated, and it ran for almost 6 hours, and eventually failed with:
ORA-01555: snapshot too old
.
Is there a better way I could write this, so the server can actually get the work done?
This query used to work previously, before we added all the LOB
filters (I guess that one extra filter pushed it over the edge, in terms of execution plan complexity).
(PasteBin because these queries are too long for StackOverflow.)
1st Query:
Gordon Linoff commented:
"The cost of an expensive query is almost always due to data movement -- joins, aggregations, and sorting. There are exceptions. For instance, long text fields or complex JSON can add significant overhead, but it general it is data movement."
Additionally, Del commented:
"A few things after a quick look, I think your query is too verbose. You are doing a LEFT JOIN on the ef table. But then your HAVING clause is checking that a COUNT on that table is > 0. Just switch it to an INNER JOIN. After that , you may be able to remove some of those NVL functions you have throughout the code if the column is required in the ef table."
These two comments both mentioned the join, as a probable cause of performance problems.
I followed Del's advice, and removed the HAVING
clause, and converted the LEFT JOIN
to INNER JOIN
.
I also removed most of the NVL
functions where I could, or consolidated them to make the query simpler.
With just those two changes, the query now executes successfully in 15 minutes, instead of running for hours and eventually failing.
I'm willing to bet the LEFT JOIN
was 99% of the problem.
Thanks to everyone who took the time to read all my ridiculous code and give me tips!