saskernel-densitysgplot

Default value used for Kernel density SAS


I'm using SAS to plot an histogram with the Kernel density. In the documentation, it is specified that we can choose the parameter c: "the standardized bandwidth for a number that is greater than 0 and less than or equal to 100." But I cannot find the default value used to create the following plot.

Does someone have an idea? Thanks!


Solution

  • SGPLOT minimizes the Asymptotic Mean Integrated Square Error (AMISE) for the kernel density function. According to PROC UNIVARIATE, which also can do KDE:

    By default, the procedure uses the AMISE method to compute kernel density estimates.

    PROC UNIVARIATE documentation

    We can confirm that they both have the same default by comparing the output.

    proc univariate data=sashelp.cars;
        var horsepower;
        histogram / kernel;
    run;
    

    In the log, we find:

    NOTE: The normal kernel estimate for c=0.7852 has a bandwidth of 21.035 and an AMISE of 392E-7.
    

    Let's plot them together and compare the values.

    proc sgplot data=sashelp.cars;  
       density horsepower/TYPE=KERNEL;  
       density horsepower/TYPE=KERNEL(c=0.7852);
       ods output sgplot;
    run;
    
    data diff;
        set sgplot;
        abs_diff = abs(KERNEL_Horsepower____Y - KERNEL_Horsepower_C_0_7852____Y);
    run;
    
    proc univariate data=diff;
        var abs_diff;
    run;
    

    enter image description here

    The average difference between all points plotted is 1.65x10^-9, with the overall largest being 6.76x10^-9. This is, essentially, zero. The reason for the differences is that the c-value given to the user in the log is lower precision than the one calculated by proc sgplot. You can get a higher precision estimate with the outkernel= option in proc univariate as well.