I'm trying to get kernal density estimates for values that are bounded [0, 1] (a proportion), and to get value for where specific quantiles occur in that density estimate. The R package scdensity has a function by the same name, which looks like it should do what I want. However, while setting a lower bound works as expected, setting an upper bound does not.
The following code reproduces my problem with a sample dataset.
# Required package
library(scdensity)
# Sample data
x <- c(0.01, 0.3, 0.5, 0.99)
This code chunk demonstrates that the arguments constraint = "boundedLeft"
and opts = list(lowerBound = 0)
produce the expected result of density estimates of 0 for x < 0.
# Lower bound
dens1 <- scdensity(x,
constraint = c("boundedLeft"),
opts = list(lowerBound = 0))
plot(dens1)
#> Warning in regularize.values(x, y, ties, missing(ties), na.rm = na.rm):
#> collapsing to unique 'x' values
This chunk demonstrates that constraint = "boundedRight"
and opts = list(upperBound = 1)
results in non-zero density estimates for x > 0, which seems inconsistent with the what was specified in the arguments.
# Upper bound
dens2 <- scdensity(x,
constraint = c("boundedRight"),
opts = list(upperBound = 1))
plot(dens2)
This chunk demonstrates that when both upper and lower bounds are specified, only the lower bound appears to work as expected.
# Lower & upper bounds
dens3 <- scdensity(x,
constraint = c("boundedLeft", "boundedRight"),
opts = list(lowerBound = 0, upperBound = 1))
plot(dens3)
#> Warning in regularize.values(x, y, ties, missing(ties), na.rm = na.rm):
#> collapsing to unique 'x' values
Created on 2024-08-19 with reprex v2.1.0
scdensity
author here. Thanks for bringing this to my attention! There is indeed a bug. If you need to fix it for immediate use, here's how to modify the source: in weightedKDE.R
, starting at line 420, the following code appears:
if ("boundedRight" %in% P$constraints) {
gR <- P$g[P$g >= P$upperBound]
if (!symmetric) {
AshapeNew <- NormalGridFcn(gR, 0, P$y, P$h)
} else {
if (P$upperBound < P$PoS) {
stop("The upper bound cannot be less than the point of symmetry.")
}
if ("boundedLeft" %in% P$constraints) {
if (!isTRUE(all.equal(P$lowerBound + P$upperBound, 2*P$PoS))) {
stop("Lower and upper bounds must be equidistant from the point of symmetry.")
}
} else {
AshapeNew <- NormalGridFcn(gR, 0, yL, P$h) + NormalGridFcn(gR, 0, yR, P$h)
P$Ashape <- rbind(P$Ashape, -AshapeNew) # <-- PROBLEM
P$bshape <- c(P$bshape, rep(-P$boundTol, dim(AshapeNew)[1])) # <-- PROBLEM
}
}
# *** PROBLEM lines should go here ***
}
The two lines marked PROBLEM
are in the wrong place; they should be moved down out of the else
block. The way it is shown above, the constraints in AshapeNew
never get written to the constraint matrix P$Ashape
.
The code has a lot of branching logic due to the many possible combinations of constraints; these lines ended up in the wrong place. Not sure how it escaped my notice during development!
I'll try to get an update through CRAN as soon as I can.
By the way, probably it's already clear to you, but scdensity
will only produce estimates that go to zero on both sides, because it's using a Gaussian KDE. So, if your proportion's true distribution is not negligible near 0 and 1, you will need to try something different.