So, I didn't know originally that there was a percentile_cont
function and I was trying to computer the 90th percentile of a number zeries by looking up the z-score (1.28155) and just doing the math. But I get ever more divergent numbers between my implementation and the built in function. Is this because of the inaccuracy of my z-score or do I simply misunderstand something here?
select round(((1.28155 * stddev(num)) + avg(num)),1) as my_number,
percentile_cont(.90) within group ( order by num) as NINETIETH from (
select level as num from dual connect by level <= 1000
)
results:
MY_NUMBER | NINETIETH |
---|---|
870.6 | 900.1 |
The z-score assumes that the distribution is a normal distribution. The generated value is the 90th percentile of a normal distribution that has the same mean and standard deviation as your data. Unfortunately, your distribution is not a normal distribution, it is a uniform distribution.
If you generate a normal distribution:
SELECT ROUND(1.28155 * s.stddev + s.mean,1) AS my_number,
( SELECT PERCENTILE_CONT(.90) WITHIN GROUP (
ORDER BY DBMS_RANDOM.NORMAL() * s.stddev + s.mean
)
FROM DUAL
CONNECT BY LEVEL <= 1000
) AS NINETIETH
FROM (SELECT 500 AS mean, 250 AS stddev FROM DUAL) s
Then the output may (randomly) be:
MY_NUMBER | NINETIETH |
---|---|
820.4 | 818.176123750928889247317476609317779925 |
Which is a much closer approximation.
If you want to find the 90th percentile for a uniform distribution then use:
SELECT 0.9 * (MAX(num) - MIN(num)) + MIN(num) as my_number,
percentile_cont(.90) within group ( order by num) as NINETIETH
FROM (
SELECT level as num
FROM dual
CONNECT BY LEVEL <= 1000
)
Which outputs:
MY_NUMBER | NINETIETH |
---|---|
900.1 | 900.1 |