Introduction
I am trying to generate a boundary corrected kernel density estimate of a set of values which has many zeroes but which cannot go below zero (percent cover of the land surface in trees - obviously a negative percent coverage is not possible). I have successfully used evmix::dbckdem()
to generate what appears to be an appropriate series of Y values, but associated X values are unrelated X values of the initial data set.
Non-reproducible example, directly referential to my data:
> min(land_coverage);max(land_coverage) # raw input data ranges from 0 to 71.78746
[1] 0
[1] 71.78746
> d.land_coverage <- density(land_coverage)
> plot(d.land_coverage, main = "KDE of values in numeric \nvector 'land_coverage'")
> xp <- seq(0, 100, length.out = 512) # I think problem may lay in choice of values for this object
> d.land_coverage_bounded <- evmix::dbckden(xp, land_coverage, lambda = 1, bcmethod = "simple")
> plot(d.land_coverage_bounded) # Y values are transfored to equal or greater than 0
> str(d.land_coverage) # includes X and Y values
List of 7
$ x : num [1:512] -7.73 -7.56 -7.39 -7.22 -7.05 ...
$ y : num [1:512] 9.82e-05 1.22e-04 1.49e-04 1.83e-04 2.22e-04 ...
$ bw : num 2.58
$ n : int 636
$ call : language density.default(x = land_coverage)
$ data.name: chr "land_coverage"
$ has.na : logi FALSE
- attr(*, "class")= chr "density"
> str(d.land_coverage_bounded) # Numeric vector of Y values only, X have been lost
num [1:512] 0.083 0.0733 0.0668 0.0626 0.0601 ...
Reproducible example, with dummy data:
> data_dummy <- rnorm(1000, 5, 2)
> data_dummy[data_dummy < 0] <- 0 # ensure no values are < 0
> min(data_dummy);max(data_dummy) # print minimum and maximum values
[1] 0
[1] 10.72429
> plot(density(data)) # plot KDE, with some density below 0
> xp <- seq(0, 10, 0.01)
> data_dummy_corrected <- dbckden(xp, data_dummy, bw =1, bcmethod = "simple")
> plot(data_dummy_corrected)
Question: How do I maintain density function mapping to initial X values when using evmix::dbckdem()
to generate a boundary-corrected KDE?
You just need to estimate density on the same grid, then the two density vectors will be 'mapped' to the same x values.
Set the length of the grid vector to a power of 2, as density()
will round up to the nearest one regardless.
xp <- seq(from = -2, to = ceiling(max(data_dummy)), len = 1024)
d <- density(data_dummy, from = xp[1], to = rev(xp)[1], n = length(xp), bw = 1)$y
d_cor <- dbckden(xp, data_dummy, bcmethod = "simple", bw = 1)
results <- data.frame(x = xp, Density = d, corrected = d_cor)
plot(Density~x, results, type = "l", col = '#FECD6C', ylim = c(0,0.2))
lines(corrected~x, results, col = "#6c9dfe", lty = 2)
legend("topleft", c("KDE", "corrected KDE"), col = c('#FECD6C',"#6c9dfe"),
lty = c(1,2))