restimationfitdistrplusvgam

Shape estimation in truncated Pareto in R


library(VGAM)
library(fitdistrplus)

fitdist(u_NI$k_u, 'truncpareto',
         start = list(lower=1,
                      upper=42016,
                      shape=1)) -> fit.k_u

length(u_NI$k_u) = 637594

I got this error:

<simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data,     gr = gradient, ddistnam = ddistname, hessian = TRUE, method = meth,     lower = lower, upper = upper, ...): function cannot be evaluated at initial parameters>
Error in fitdist(u_NI$k_u, "truncpareto", start = list(lower = 1, upper = 42016,  : 
  the function mle failed to estimate the parameters, 
                with the error code 100
In addition: Warning messages:
1: In fitdist(u_NI$k_u, "truncpareto", start = list(lower = 1, upper = 42016,  :
  The dtruncpareto function should return a zero-length vector when input has length zero
2: In fitdist(u_NI$k_u, "truncpareto", start = list(lower = 1, upper = 42016,  :
  The ptruncpareto function should return a zero-length vector when input has length zero

Is the problem in the excessive size of the dataset or in the start parameters?

REPRODUCIBILE EXAMPLE:

library(VGAM)
library(fitdistrplus)

rtruncpareto(100,1,100,1.5) -> a
fitdist(a, "truncpareto",
        start = list(lower=1,
                     upper=100,
                     shape=1.5))

This is not going to work, and I don't understand why.

It seems it has a problem here:

argument 'lower' must be positive


Solution

  • Part of your problem is a more general issue, i.e. that for a truncated distribution the MLE of the boundary parameter(s) is in general equal to the observed min/max of the data set. So you should always be able to do at least as well by setting the values of the lower/upper bounds equal to the min/max (based on my experience trying this they have to be slightly below/above the observed bounds). (I also found I had to set lower = 0 to stop the algorithm from trying negative values of the shape parameter.)

    library(VGAM); library(fitdistrplus)
    set.seed(101)
    rtruncpareto(100,1,100,1.5) -> a
    eps <- 1e-8
    fitdist(a, "truncpareto",
            start = list(shape=1.5),
            fix.arg = list(lower = min(a) - eps, upper = max(a) + eps),
            lower = 0)
    Parameters:
          estimate Std. Error
    shape 1.349885  0.1554436
    Fixed parameters:
              value
    lower  1.006844
    upper 25.906577
    

    An alternative to fitdist would be bbmle:

    library(bbmle)
    m1 <- mle2(a ~ dtruncpareto(lower = min(a) - eps,
                                upper = max(a) + eps,
                                shape = exp(lshape)),
               start = list(lshape = 0),
               data = data.frame(a),
             method = "BFGS")
    exp(coef(m1))   
      lshape 
    1.349884 
    

    bbmle is a little bit more flexible and allows you to fit the shape parameter on the log scale, which is generally more robust (and makes the standard deviation estimates more useful). Using method = "BFGS" here because the default Nelder-Mead algorithm works poorly for 1-dimensional optimization; could also use method = "Brent" (which would be more efficient and robust) but would then need to provide explicit lower and upper bounds for the lshape parameter.