rkernel-density

Why does scdensity not work with boundedRight in this code?


I'm trying to get kernal density estimates for values that are bounded [0, 1] (a proportion), and to get value for where specific quantiles occur in that density estimate. The R package scdensity has a function by the same name, which looks like it should do what I want. However, while setting a lower bound works as expected, setting an upper bound does not.

The following code reproduces my problem with a sample dataset.

# Required package
library(scdensity)

# Sample data
x <- c(0.01, 0.3, 0.5, 0.99)

This code chunk demonstrates that the arguments constraint = "boundedLeft" and opts = list(lowerBound = 0) produce the expected result of density estimates of 0 for x < 0.

# Lower bound

dens1 <- scdensity(x, 
                   constraint = c("boundedLeft"),
                   opts = list(lowerBound = 0))
plot(dens1)
#> Warning in regularize.values(x, y, ties, missing(ties), na.rm = na.rm):
#> collapsing to unique 'x' values

This chunk demonstrates that constraint = "boundedRight" and opts = list(upperBound = 1) results in non-zero density estimates for x > 0, which seems inconsistent with the what was specified in the arguments.

# Upper bound

dens2 <- scdensity(x, 
                   constraint = c("boundedRight"),
                   opts = list(upperBound = 1))
plot(dens2)

This chunk demonstrates that when both upper and lower bounds are specified, only the lower bound appears to work as expected.

# Lower & upper bounds

dens3 <- scdensity(x, 
                   constraint = c("boundedLeft", "boundedRight"),
                   opts = list(lowerBound = 0, upperBound = 1))
plot(dens3)
#> Warning in regularize.values(x, y, ties, missing(ties), na.rm = na.rm):
#> collapsing to unique 'x' values

Created on 2024-08-19 with reprex v2.1.0


Solution

  • scdensity author here. Thanks for bringing this to my attention! There is indeed a bug. If you need to fix it for immediate use, here's how to modify the source: in weightedKDE.R, starting at line 420, the following code appears:

      if ("boundedRight" %in% P$constraints) {
        gR <- P$g[P$g >= P$upperBound]
        if (!symmetric) {
          AshapeNew <- NormalGridFcn(gR, 0, P$y, P$h)
        } else {
          if (P$upperBound < P$PoS) {
            stop("The upper bound cannot be less than the point of symmetry.")
          }
          if ("boundedLeft" %in% P$constraints) {
            if (!isTRUE(all.equal(P$lowerBound + P$upperBound, 2*P$PoS))) {
              stop("Lower and upper bounds must be equidistant from the point of symmetry.")
            }
          } else {
            AshapeNew <- NormalGridFcn(gR, 0, yL, P$h) + NormalGridFcn(gR, 0, yR, P$h)
            P$Ashape <- rbind(P$Ashape, -AshapeNew)   # <-- PROBLEM
            P$bshape <- c(P$bshape, rep(-P$boundTol, dim(AshapeNew)[1]))   # <-- PROBLEM
          }
        }
      # *** PROBLEM lines should go here ***
      }
    
    

    The two lines marked PROBLEM are in the wrong place; they should be moved down out of the else block. The way it is shown above, the constraints in AshapeNew never get written to the constraint matrix P$Ashape.

    The code has a lot of branching logic due to the many possible combinations of constraints; these lines ended up in the wrong place. Not sure how it escaped my notice during development!

    I'll try to get an update through CRAN as soon as I can.

    By the way, probably it's already clear to you, but scdensity will only produce estimates that go to zero on both sides, because it's using a Gaussian KDE. So, if your proportion's true distribution is not negligible near 0 and 1, you will need to try something different.