I am using optim
to fit various probability distributions to two given tertiles t1
and t2
. The following function works, where qfun
is the quantile function of the distribution and par_start
gives the initial values of the parameters of the distribution:
tertile_fit <- function(t1, t2, qfun, par_start) {
gapfn <- function(par, t1, t2, qfun) {
a <- t1 - do.call(qfun, as.list(c(1/3, par)))
b <- t2 - do.call(qfun, as.list(c(2/3, par)))
a ^ 2 + b ^ 2
}
result <- optim(par = par_start, gapfn, t1 = t1, t2 = t2, qfun = qfun)
if (result$convergence != 0) {
stop("Optmisation failed")
} else {
result$par %>%
as.list
}
}
The length of par_start
determines how many of the distribution's parameters are varied in the optimisation but very much restricts the choice: if length(par_start) = 3
for example then the optimisation will be made over the first 3 parameters in formals(qfun)
.
My problem is that for some distributions I want to be able to choose which parameters are to be fixed. I could solve that problem if I were writing specific code for each distribution by amending the do.call
lines in gapfn
, to include the names and values of the parameters I fix, for example:
do.call(qfun, as.list(c(1/3, mean = 0, par)))
with par
now the parameters that are allowed to vary.
But what I want is a function that can take any probability distribution and fix named parameters. It might look something like this:
new_tertile_fit <- function(t1, t2, qfun, fixed_params, par_start)
where fixed_params
is a data.frame(?) or a list(?) giving the names of the parameters and the values at which they are fixed.
What is giving me trouble is extracting that information from fixed_params
so that the do.call
lines can include the specific statements along the lines par1 = 2
if par1
is one of the parameters of the distribution being fitted.
I'd suggest a parameter accepting a logical vector of length length(par_start)
indicating if the corresponding parameter is supposed to be fixed:
tertile_fit <- function(t1, t2, qfun, par_start, fixed = logical(length(par_start))) {
gapfn <- function(par, t1, t2, qfun) {
par_start[!fixed] <- par
a <- t1 - do.call(qfun, as.list(c(1/3, par_start)))
b <- t2 - do.call(qfun, as.list(c(2/3, par_start)))
a ^ 2 + b ^ 2
}
result <- optim(par = par_start[!fixed], gapfn, t1 = t1, t2 = t2, qfun = qfun)
if (result$convergence != 0) {
stop("Optmisation failed")
} else {
par_start[!fixed] <- result$par
}
par_start
}
# test with three-parameter beta (with non-centrality)
# don't fix any parameters
(par <- tertile_fit(0.5, 0.8, qbeta, c(1, 1, 1)))
#> [1] 1.0686711 0.8931586 1.0555377
qbeta((1:2)/3, par[1], par[2], par[3])
#> [1] 0.4999925 0.8000041
# fix the non-centrality parameter
(par <- tertile_fit(0.5, 0.8, qbeta, c(1, 1, 2), !2:0))
#> [1] 0.8132744 0.9571455 2.0000000
qbeta((1:2)/3, par[1], par[2], par[3])
#> [1] 0.4999997 0.7999982
However, if you try to fix one of two parameters, you'll get a warning (and you'll probably not fit your tertiles since you have one parameter and two constraints):
# test with gamma distribution, fixing the rate parameter
(par <- tertile_fit(0.5, 2, qgamma, c(1, 10), 0:1))
#> Warning in optim(par = par_start[!fixed], gapfn, t1 = t1, t2 = t2, qfun = qfun): one-dimensional optimization by Nelder-Mead is unreliable:
#> use "Brent" or optimize() directly
#> [1] 13.12813 10.00000
qgamma((1:2)/3, par[1], par[2])
#> [1] 1.131999 1.439610