sqloracle-databasestatistics

Computing 90th percentile manually vs function doesn't produce identical results, why?


So, I didn't know originally that there was a percentile_cont function and I was trying to computer the 90th percentile of a number zeries by looking up the z-score (1.28155) and just doing the math. But I get ever more divergent numbers between my implementation and the built in function. Is this because of the inaccuracy of my z-score or do I simply misunderstand something here?

select round(((1.28155 * stddev(num)) + avg(num)),1) as my_number,
       percentile_cont(.90) within group ( order by num) as NINETIETH from (
    select level as num from dual connect by level <= 1000
)

results:

MY_NUMBER NINETIETH
870.6 900.1

Solution

  • The z-score assumes that the distribution is a normal distribution. The generated value is the 90th percentile of a normal distribution that has the same mean and standard deviation as your data. Unfortunately, your distribution is not a normal distribution, it is a uniform distribution.

    If you generate a normal distribution:

    SELECT ROUND(1.28155 * s.stddev + s.mean,1) AS my_number,
           ( SELECT PERCENTILE_CONT(.90) WITHIN GROUP (
                      ORDER BY DBMS_RANDOM.NORMAL() * s.stddev + s.mean
                    )
             FROM   DUAL
             CONNECT BY LEVEL <= 1000
           ) AS NINETIETH
    FROM   (SELECT 500 AS mean, 250 AS stddev FROM DUAL) s
    

    Then the output may (randomly) be:

    MY_NUMBER NINETIETH
    820.4 818.176123750928889247317476609317779925

    Which is a much closer approximation.


    If you want to find the 90th percentile for a uniform distribution then use:

    SELECT 0.9 * (MAX(num) - MIN(num)) + MIN(num) as my_number,
           percentile_cont(.90) within group ( order by num) as NINETIETH
    FROM   (
      SELECT level as num
      FROM   dual
      CONNECT BY LEVEL <= 1000
    )
    

    Which outputs:

    MY_NUMBER NINETIETH
    900.1 900.1

    fiddle