I am trying to do performance tuning on a SQL query in Oracle 12c which is using a window partition. There's an index created on HUB_POL_KEY, PIT_EFF_START_DT on the table PIT. While running the explain plan with /*+ gather_plan_statistics */ hint, I observed there's a Window Sort Step in the Explain Plan which is having an Estimated Row Count of 5000K and an Actual Row Count of 1100. I executed DBMS_STATS.GATHER_TABLE_STATS on the table. When I checked in USER_TAB_COLUMNS table, I see there's no histogram generated for HUB_POL_KEY, PIT_EFF_START_DT. However, there's histogram existing for all other columns.
SQL Query
SELECT
PIT.HUB_POL_KEY,
NVL(LEAD(PIT.PIT_EFF_START_DT) OVER (PARTITION BY PIT.HUB_POL_KEY ORDER BY PIT.PIT_EFF_START_DT) ,TO_DATE('31.12.9999', 'DD.MM.YYYY')) EFF_END_DT
FROM PIT
1st Try:
EXEC DBMS_STATS.GATHER_TABLE_STATS('stg','PIT');
2nd Try:
EXEC DBMS_STATS.GATHER_TABLE_STATS('stg','PIT', method_opt=>('FOR COLUMNS SIZE 254 (HUB_POL_KEY,PIT_EFF_START_DT)'));
Checking Histogram:
SELECT HISTOGRAM FROM USER_TAB_COLUMNS
WHERE TABLE_NAME = 'PIT'
AND COLUMN_NAME IN ('HUB_POL_KEY','PIT_EFF_START_DT') --NONE
Table Statistics:
SELECT COUNT(*) FROM PIT --5570253
SELECT COLUMN_NAME,NUM_DISTINCT,NUM_BUCKETS,HISTOGRAM FROM USER_TAB_COL_STATISTICS
WHERE TABLE_NAME = 'PIT'
AND COLUMN_NAME IN ('HUB_POL_KEY','PIT_EFF_START_DT')
+------------------+--------------+-------------+-----------+
| COLUMN_NAME | NUM_DISTINCT | NUM_BUCKETS | HISTOGRAM |
+------------------+--------------+-------------+-----------+
| HUB_POL_KEY | 4703744 | 1 | NONE |
| PIT_EFF_START_DT | 154416 | 1 | NONE |
+------------------+--------------+-------------+-----------+
What am I missing here? Why is the bucket size 1 even when I am running the gather_table_stat procedure with method_opt specifying a size?
The correct syntax as per Oracle documentation should be method_opt=>('FOR COLUMNS (HUB_POL_KEY,PIT_EFF_START_DT) SIZE 254')
. Trying it did not create the histogram stats as expected thought (maybe a bug ¯_(ツ)_/¯).
On the other side using method_opt=>('FOR ALL COLUMNS SIZE 254')
or method_opt=>('FOR COLUMNS <column_name> SIZE 254')
is working fine.
Probably a workaround would be then to gather stats for columns separately:
EXEC DBMS_STATS.GATHER_TABLE_STATS('stg','PIT', method_opt=>('FOR COLUMNS HUB_POL_KEY SIZE 254'));
EXEC DBMS_STATS.GATHER_TABLE_STATS('stg','PIT', method_opt=>('FOR COLUMNS PIT_EFF_START_DT SIZE 254'));