r function optimization bayesian credible-interval

Using optimize() to find the shortest interval that takes 95% area under a curve in R

Background:

I have a curve whose Y-values are produced by my small R function below (neatly annotated). If you run my entire R code, you see my curve (but remember, it's a function so if I changed the argument values, I could get a different curve):

Question:

Obviously, one can determine/assume many intervals that would cover/take 95% of the total area under this curve. But using, optimize(), how can I find the SHORTEST (in x-value units) of these many possible 95% intervals? What then would be the corresponding x-values for the the two ends of this shortest 95% interval?

Note: The idea of shortest interval for a uni-modal curve like mine makes sense. In reality, the shortest one would be the one that tends to be toward the middle where the height (y-value) is larger, so then x-value doesn't need to be so large for the intended interval to cover/take 95% of the total area under the curve.

Here is my R code (please run the entire code):

ppp <- function(f, N, df1, df2, petasq, alpha, beta) {

 pp <- function(petasq) dbeta(petasq, alpha, beta)
 ll <- function(petasq) df(f, df1, df2, (petasq * N) / (1 - petasq) )

 marg <- integrate(function(x) pp(x)*ll(x), 0, 1)[[1]]

po <- function(x) pp(x)*ll(x) / marg
return(po(petasq) )

}
## @@@ END OF MY R FUNCTION.

# Now I use my function above to get the y-values for my plot:

petasq  <- seq(0, 1, by = .0001) ## These are X-values for my plot
f  <- 30       # a function needed argument
df1 <- 3       # a function needed argument
df2 <- 108     # a function needed argument
N  <- 120      # a function needed argument
alpha = 5      # a function needed argument
beta = 4       # a function needed argument


## Now use the ppp() function to get the Y-values for the X-value range above:
y.values <- ppp(f, N, df1, df2, petasq, alpha, beta)

## Finally plot petasq (as X-values) against the Y.values:
plot(petasq, y.values, ty="l", lwd = 3 )

Solution

Based on your revised question, I found the optimization that minimizes the SHORTEST distance (in x-value units) between LEFT and RIGHT boundaries:

ppp <- function(petasq, f, N, df1, df2, alpha, beta) {

 pp <- function(petasq) dbeta(petasq, alpha, beta)
 ll <- function(petasq) df(f, df1, df2, (petasq * N) / (1 - petasq) )

 marg <- integrate(function(x) pp(x)*ll(x), 0, 1)[[1]]

po <- function(x) pp(x)*ll(x) / marg
return(po(petasq) )
}

petasq  <- seq(0, 1, by = .0001) ## These are X-values for my plot
f  <- 30       # a function needed argument
df1 <- 3       # a function needed argument
df2 <- 108     # a function needed argument
N  <- 120      # a function needed argument
alpha = 5      # a function needed argument
beta = 4       # a function needed argument

optim_func <- function(x_left) {
    int_function <- function(petasq) {
        ppp(petasq, f=f, N=N, df1=df1, df2=df2, alpha=alpha, beta=beta)
    }

    # For every LEFT value, find the corresponding RIGHT value that gives 95% area.  

    find_95_right <- function(x_right) {
        (0.95 - integrate(int_function, lower=x_left, upper=x_right, subdivisions = 10000)$value)^2
    }
    x_right_obj <- optimize(f=find_95_right, interval=c(0.5,1))
    if(x_right_obj$objective > .Machine$double.eps^0.25) return(100)

    #Return the DISTANCE BETWEEN LEFT AND RIGHT
    return(x_right_obj$minimum - x_left)
}

#MINIMIZE THE DISTANCE BETWEEN LEFT AND RIGHT
x_left <- optimize(f=optim_func, interval=c(0.30,0.40))$minimum
find_95_right <- function(x_right) {
    (0.95 - integrate(int_function, lower=x_left, upper=x_right, subdivisions = 10000)$value)^2
}
    int_function <- function(petasq) {
        ppp(petasq, f=f, N=N, df1=df1, df2=df2, alpha=alpha, beta=beta)
    }
x_right <- optimize(f=find_95_right, interval=c(0.5,1))$minimum

See the comments in the code. Hopefully this finally satisfies your question :) Results:

> x_right
[1] 0.5409488
> x_left
[1] 0.3201584

Also, you can plot the distance between LEFT and RIGHT as a function of the left boundary:

left_x_values <- seq(0.30, 0.335, 0.0001)
DISTANCE <- sapply(left_x_values, optim_func)

plot(left_x_values, DISTANCE, type="l")