rfunctionoptimizationbayesiancredible-interval

Using optimize() to find the shortest interval that takes 95% area under a curve in R


Background:

I have a curve whose Y-values are produced by my small R function below (neatly annotated). If you run my entire R code, you see my curve (but remember, it's a function so if I changed the argument values, I could get a different curve):

enter image description here

Question:

Obviously, one can determine/assume many intervals that would cover/take 95% of the total area under this curve. But using, optimize(), how can I find the SHORTEST (in x-value units) of these many possible 95% intervals? What then would be the corresponding x-values for the the two ends of this shortest 95% interval?

Note: The idea of shortest interval for a uni-modal curve like mine makes sense. In reality, the shortest one would be the one that tends to be toward the middle where the height (y-value) is larger, so then x-value doesn't need to be so large for the intended interval to cover/take 95% of the total area under the curve.

Here is my R code (please run the entire code):

ppp <- function(f, N, df1, df2, petasq, alpha, beta) {

 pp <- function(petasq) dbeta(petasq, alpha, beta)
 ll <- function(petasq) df(f, df1, df2, (petasq * N) / (1 - petasq) )

 marg <- integrate(function(x) pp(x)*ll(x), 0, 1)[[1]]

po <- function(x) pp(x)*ll(x) / marg
return(po(petasq) )

}
## @@@ END OF MY R FUNCTION.

# Now I use my function above to get the y-values for my plot:

petasq  <- seq(0, 1, by = .0001) ## These are X-values for my plot
f  <- 30       # a function needed argument
df1 <- 3       # a function needed argument
df2 <- 108     # a function needed argument
N  <- 120      # a function needed argument
alpha = 5      # a function needed argument
beta = 4       # a function needed argument


## Now use the ppp() function to get the Y-values for the X-value range above:
y.values <- ppp(f, N, df1, df2, petasq, alpha, beta)

## Finally plot petasq (as X-values) against the Y.values:
plot(petasq, y.values, ty="l", lwd = 3 )

Solution

  • Based on your revised question, I found the optimization that minimizes the SHORTEST distance (in x-value units) between LEFT and RIGHT boundaries:

    ppp <- function(petasq, f, N, df1, df2, alpha, beta) {
    
     pp <- function(petasq) dbeta(petasq, alpha, beta)
     ll <- function(petasq) df(f, df1, df2, (petasq * N) / (1 - petasq) )
    
     marg <- integrate(function(x) pp(x)*ll(x), 0, 1)[[1]]
    
    po <- function(x) pp(x)*ll(x) / marg
    return(po(petasq) )
    }
    
    petasq  <- seq(0, 1, by = .0001) ## These are X-values for my plot
    f  <- 30       # a function needed argument
    df1 <- 3       # a function needed argument
    df2 <- 108     # a function needed argument
    N  <- 120      # a function needed argument
    alpha = 5      # a function needed argument
    beta = 4       # a function needed argument
    
    optim_func <- function(x_left) {
        int_function <- function(petasq) {
            ppp(petasq, f=f, N=N, df1=df1, df2=df2, alpha=alpha, beta=beta)
        }
    
        # For every LEFT value, find the corresponding RIGHT value that gives 95% area.  
    
        find_95_right <- function(x_right) {
            (0.95 - integrate(int_function, lower=x_left, upper=x_right, subdivisions = 10000)$value)^2
        }
        x_right_obj <- optimize(f=find_95_right, interval=c(0.5,1))
        if(x_right_obj$objective > .Machine$double.eps^0.25) return(100)
    
        #Return the DISTANCE BETWEEN LEFT AND RIGHT
        return(x_right_obj$minimum - x_left)
    }
    
    #MINIMIZE THE DISTANCE BETWEEN LEFT AND RIGHT
    x_left <- optimize(f=optim_func, interval=c(0.30,0.40))$minimum
    find_95_right <- function(x_right) {
        (0.95 - integrate(int_function, lower=x_left, upper=x_right, subdivisions = 10000)$value)^2
    }
        int_function <- function(petasq) {
            ppp(petasq, f=f, N=N, df1=df1, df2=df2, alpha=alpha, beta=beta)
        }
    x_right <- optimize(f=find_95_right, interval=c(0.5,1))$minimum
    

    See the comments in the code. Hopefully this finally satisfies your question :) Results:

    > x_right
    [1] 0.5409488
    > x_left
    [1] 0.3201584
    

    Also, you can plot the distance between LEFT and RIGHT as a function of the left boundary:

    left_x_values <- seq(0.30, 0.335, 0.0001)
    DISTANCE <- sapply(left_x_values, optim_func)
    
    plot(left_x_values, DISTANCE, type="l")
    

    plot