I'm implementing a Maximum-Likelihood estimation in R for a three parameter reverse Weibull model and have some troubles to get plausible results, which include: Bad optimization results, unwanted optimx behaviour. Beside these I wonder, how I could make use of parscale in this model.
Here is my implementation attempt:
To generate data I use the probabilty integral transform:
#Generate N sigma*RWei(alph)-mu distributed points
gen.wei <- function(N, theta) {
alph <- theta[1]
mu <- theta[2]
sigma <- theta[3]
return(
mu - sigma * (- log (runif(N)))**(1/alph)
)
}
Now I define the Log-Likelihood and negative Log-Likelihood to use optimx optimization:
#LL----
ll.wei <- function(theta,x) {
N <- length(x)
alph <- theta[1]
mu <- theta[2]
sigma <- theta[3]
val <- sum(ifelse(
x <= mu,
log(alph/sigma) + (alph-1) * log( (mu-x)/sigma) - ( (mu-x)/sigma)**(alph-1),
-Inf
))
return(val)
}
#Negative LL----
nll.wei <- function(theta,x) {
return(-ll.wei(theta=theta, x=x))
}
Afterwards I define the analytical gradient of the negative LL. Remark: There are points at which the negative LL isn't differentiable (the upper end-point mu)
gradnll.wei <- function(theta,x) {
N <- length(x)
alph <- theta[1]
mu <- theta[2]
sigma <- theta[3]
argn <- (mu-x)/sigma
del.alph <- sum(ifelse(x <= mu,
1/alph + log(argn) - log(argn) * argn**(alph-1),
0
))
del.mu <- sum(ifelse(x <= mu,
(alph-1)/(mu-x) - (alph-1)/sigma * argn**(alph-2),
0))
del.sigma <- sum(ifelse(x <= mu,
((alph-1)*argn**(alph-1)-alph)/sigma,
0))
return (-c(del.alph, del.mu, del.sigma))
}
Finally I try to optimize using the optimx package and the methods Nelder-Mead (derivative free) and BFGS (my LL is kinda smooth, there's just one point, which is problematic).
#MLE for Weibull
mle.wei <- function(start,sample) {
optimx(
par=start,
fn = nll.wei,
gr = gradnll.wei,
method = c("BFGS"),
x = sample
)
}
theta.s <- c(4,1,1/2) #test for parameters
sample <- gen.wei(100, theta.s) #generate 100 data points distributed like theta.s
mle.wei(start=c(8,4, 2), sample) #MLE Estimation
To my surprise I get the following error:
Error in optimx.check(par, optcfg$ufn, optcfg$ugr, optcfg$uhess, lower, :
Cannot evaluate function at initial parameters
I checked manually: Both nll and gradnll are finite at the initial parameters... If i switch to optim instead of optimx I get a result, but a pretty bad one:
$par
[1] 8.178674e-01 9.115766e-01 1.745724e-06
$value
[1] -1072.786
$counts
function gradient
574 100
$convergence
[1] 1
$message
NULL
So it doesn't converge. If I don't supply the gradient to BFGS, there isn't a result. If I use Nelder-Mead instead:
$par
[1] 1.026393e+00 9.649121e-01 9.865624e-18
$value
[1] -3745.039
$counts
function gradient
502 NA
$convergence
[1] 1
$message
NULL
So it is also very bad...
My questions are:
Thank you very much in advance!
Re 3: That's kind of a bug in optimx
, but one that's hard to avoid. It uses x
as a variable name when calculating a numerical gradient; you also use it as an "additional parameter" to your functions. You can work around that by renaming your argument, e.g. call it xdata
in your functions.
Re 1 & 2: There are several techniques to handle boundary problems in optimization. Setting to a big constant value tends not to work: if the optimizer goes out of bounds, it finds the objective function really flat. If the exact boundary is legal, then pushing the parameter to the boundary and adding a penalty sometimes works. If the exact boundary is illegal, you might be able to reflect: e.g. if mu > 0 is a requirement, sometimes replacing mu by abs(mu) in the objective function gets things to work. Sometimes the best solution is to get rid of the boundary by transforming the parameters.
Edited to add some more details:
For this problem, it looks to me as though transformations of the parameters might work. I think alpha
and sigma
must both be positive. Setting alpha <- exp(theta[1])
and sigma <- exp(theta[3])
will guarantee that. Limits on mu
are harder, but I think mu > max(xdata)
is needed, so mu <- max(xdata) + exp(theta[2])
should keep it in bounds. Of course, making these changes messes up your gradient formula and starting values.
As to resources: I'm afraid I don't know any. This advice is based on years of painful experience.