Maximum-Likelihood Estimation of three parameter reverse Weibull model implementation in R

I'm implementing a Maximum-Likelihood estimation in R for a three parameter reverse Weibull model and have some troubles to get plausible results, which include: Bad optimization results, unwanted optimx behaviour. Beside these I wonder, how I could make use of parscale in this model.

Here is my implementation attempt:

To generate data I use the probabilty integral transform:

#Generate N sigma*RWei(alph)-mu distributed points        
gen.wei <- function(N, theta) {
      alph <- theta[1]
      mu <- theta[2]
      sigma <- theta[3]
      return(
        mu - sigma * (- log (runif(N)))**(1/alph)
      )
    }

Now I define the Log-Likelihood and negative Log-Likelihood to use optimx optimization:

#LL----
ll.wei <- function(theta,x) {
  N <- length(x)
  alph <- theta[1]
  mu <- theta[2]
  sigma <- theta[3]
  val <- sum(ifelse(
    x <= mu,
    log(alph/sigma) + (alph-1) * log( (mu-x)/sigma) - ( (mu-x)/sigma)**(alph-1),
    -Inf
  ))
  return(val)
}
#Negative LL----
nll.wei <- function(theta,x) {
  return(-ll.wei(theta=theta, x=x))
         }

Afterwards I define the analytical gradient of the negative LL. Remark: There are points at which the negative LL isn't differentiable (the upper end-point mu)

gradnll.wei <- function(theta,x) {
  N <- length(x)
  alph <- theta[1]
  mu <- theta[2]
  sigma <- theta[3]
  argn <- (mu-x)/sigma
  del.alph <- sum(ifelse(x <= mu,
    1/alph + log(argn) - log(argn) * argn**(alph-1),
    0
  ))
  del.mu <- sum(ifelse(x <= mu,
    (alph-1)/(mu-x) - (alph-1)/sigma * argn**(alph-2),
    0))
  del.sigma <- sum(ifelse(x <= mu,
    ((alph-1)*argn**(alph-1)-alph)/sigma,
    0))
  return (-c(del.alph, del.mu, del.sigma))
}

Finally I try to optimize using the optimx package and the methods Nelder-Mead (derivative free) and BFGS (my LL is kinda smooth, there's just one point, which is problematic).

      #MLE for Weibull
       mle.wei <- function(start,sample) {
      optimx(
        par=start,
        fn = nll.wei,
        gr = gradnll.wei,
        method = c("BFGS"),
        x = sample
      )
    }
    theta.s <- c(4,1,1/2) #test for parameters
    sample <- gen.wei(100, theta.s) #generate 100 data points distributed like theta.s
mle.wei(start=c(8,4, 2), sample) #MLE Estimation

To my surprise I get the following error:

Error in optimx.check(par, optcfg$ufn, optcfg$ugr, optcfg$uhess, lower,  : 
  Cannot evaluate function at initial parameters

I checked manually: Both nll and gradnll are finite at the initial parameters... If i switch to optim instead of optimx I get a result, but a pretty bad one:

 $par
[1] 8.178674e-01 9.115766e-01 1.745724e-06

$value
[1] -1072.786

$counts
function gradient 
     574      100 

$convergence
[1] 1

$message
NULL

So it doesn't converge. If I don't supply the gradient to BFGS, there isn't a result. If I use Nelder-Mead instead:

$par
[1] 1.026393e+00 9.649121e-01 9.865624e-18

$value
[1] -3745.039

$counts
function gradient 
     502       NA 

$convergence
[1] 1

$message
NULL

So it is also very bad...

My questions are:

Should I instead of defining the ll outside of the support as -Inf give it a very high negative value like -1e20 to circumvent -Inf errors or does it not matter?
Like the first one but for the gradient: technically the ll isn't defined outside of the support but since the likelihood is 0 albeit constant outside of the support, is it smart to define the gradnll as 0 outside? 3.I checked the implementation of the MLE estimator fgev from the evd package and saw that they use the BFGS method but don't supply the gradient even though the gradient does exist. Therefore my question is, whether there are situations where it is contraproductive to supply the gradient since it isn't defined everywhere (like my and the evd case)?
I got an error of "argument x matches multiple formal arguments" type in optimx but not in optim, which surprised me. What am I doing wrong in supplying my functions and data to the optimx function?

Thank you very much in advance!

Solution

Re 3: That's kind of a bug in optimx, but one that's hard to avoid. It uses x as a variable name when calculating a numerical gradient; you also use it as an "additional parameter" to your functions. You can work around that by renaming your argument, e.g. call it xdata in your functions.

Re 1 & 2: There are several techniques to handle boundary problems in optimization. Setting to a big constant value tends not to work: if the optimizer goes out of bounds, it finds the objective function really flat. If the exact boundary is legal, then pushing the parameter to the boundary and adding a penalty sometimes works. If the exact boundary is illegal, you might be able to reflect: e.g. if mu > 0 is a requirement, sometimes replacing mu by abs(mu) in the objective function gets things to work. Sometimes the best solution is to get rid of the boundary by transforming the parameters.

Edited to add some more details:

For this problem, it looks to me as though transformations of the parameters might work. I think alpha and sigma must both be positive. Setting alpha <- exp(theta[1]) and sigma <- exp(theta[3]) will guarantee that. Limits on mu are harder, but I think mu > max(xdata) is needed, so mu <- max(xdata) + exp(theta[2]) should keep it in bounds. Of course, making these changes messes up your gradient formula and starting values.

As to resources: I'm afraid I don't know any. This advice is based on years of painful experience.