Why does scdensity not work with boundedRight in this code?

I'm trying to get kernal density estimates for values that are bounded [0, 1] (a proportion), and to get value for where specific quantiles occur in that density estimate. The R package scdensity has a function by the same name, which looks like it should do what I want. However, while setting a lower bound works as expected, setting an upper bound does not.

The following code reproduces my problem with a sample dataset.

# Required package
library(scdensity)

# Sample data
x <- c(0.01, 0.3, 0.5, 0.99)

This code chunk demonstrates that the arguments constraint = "boundedLeft" and opts = list(lowerBound = 0) produce the expected result of density estimates of 0 for x < 0.

# Lower bound

dens1 <- scdensity(x, 
                   constraint = c("boundedLeft"),
                   opts = list(lowerBound = 0))
plot(dens1)
#> Warning in regularize.values(x, y, ties, missing(ties), na.rm = na.rm):
#> collapsing to unique 'x' values

This chunk demonstrates that constraint = "boundedRight" and opts = list(upperBound = 1) results in non-zero density estimates for x > 0, which seems inconsistent with the what was specified in the arguments.

# Upper bound

dens2 <- scdensity(x, 
                   constraint = c("boundedRight"),
                   opts = list(upperBound = 1))
plot(dens2)

This chunk demonstrates that when both upper and lower bounds are specified, only the lower bound appears to work as expected.

# Lower & upper bounds

dens3 <- scdensity(x, 
                   constraint = c("boundedLeft", "boundedRight"),
                   opts = list(lowerBound = 0, upperBound = 1))
plot(dens3)
#> Warning in regularize.values(x, y, ties, missing(ties), na.rm = na.rm):
#> collapsing to unique 'x' values

^{Created on 2024-08-19 with reprex v2.1.0}

Solution

scdensity author here. Thanks for bringing this to my attention! There is indeed a bug. If you need to fix it for immediate use, here's how to modify the source: in weightedKDE.R, starting at line 420, the following code appears:

  if ("boundedRight" %in% P$constraints) {
    gR <- P$g[P$g >= P$upperBound]
    if (!symmetric) {
      AshapeNew <- NormalGridFcn(gR, 0, P$y, P$h)
    } else {
      if (P$upperBound < P$PoS) {
        stop("The upper bound cannot be less than the point of symmetry.")
      }
      if ("boundedLeft" %in% P$constraints) {
        if (!isTRUE(all.equal(P$lowerBound + P$upperBound, 2*P$PoS))) {
          stop("Lower and upper bounds must be equidistant from the point of symmetry.")
        }
      } else {
        AshapeNew <- NormalGridFcn(gR, 0, yL, P$h) + NormalGridFcn(gR, 0, yR, P$h)
        P$Ashape <- rbind(P$Ashape, -AshapeNew)   # <-- PROBLEM
        P$bshape <- c(P$bshape, rep(-P$boundTol, dim(AshapeNew)[1]))   # <-- PROBLEM
      }
    }
  # *** PROBLEM lines should go here ***
  }

The two lines marked PROBLEM are in the wrong place; they should be moved down out of the else block. The way it is shown above, the constraints in AshapeNew never get written to the constraint matrix P$Ashape.

The code has a lot of branching logic due to the many possible combinations of constraints; these lines ended up in the wrong place. Not sure how it escaped my notice during development!

I'll try to get an update through CRAN as soon as I can.

By the way, probably it's already clear to you, but scdensity will only produce estimates that go to zero on both sides, because it's using a Gaussian KDE. So, if your proportion's true distribution is not negligible near 0 and 1, you will need to try something different.