rfitdistrplus

How to return objects created in a function and ignore the ones with error/NA?


I have edited my question

Goal

I want to keep only those objects that were successfully created and ignore those that threw errors.

Example

Please note that this is just a reproducible example. My original dataset is different.

The following function takes any variable of mtcars dataset, fits three theoretical distributions, and then returns the goodness of fit stats:

library(fitdistrplus)

fit_distt <- function(var) {
  
v <- mtcars[, var]
  
f1 <- fitdist(data = v, distr = "norm")

f2 <- fitdist(data = v, distr = "nbinom")

f3 <- fitdist(data = v, distr = "gamma")

gofstat(f = list(f1, f2, f3), 
        chisqbreaks = c(0, 3, 3.5, 4, 4.5, 
                        5, 10, 20, 30, 40),
        fitnames = c("normal", "nbinom", "gamma"))

}

For instance:

> fit_distt("gear")
Goodness-of-fit statistics
                                normal    nbinom     gamma
Kolmogorov-Smirnov statistic 0.2968616 0.4967268 0.3030232
Cramer-von Mises statistic   0.4944390 1.5117544 0.5153004
Anderson-Darling statistic   3.1060083 7.2858460 3.1742713

Goodness-of-fit criteria
                                 normal   nbinom    gamma
Akaike's Information Criterion 74.33518 109.9331 72.07507
Bayesian Information Criterion 77.26665 112.8646 75.00655

Problem

Some theoretical distributions do not successfully fit on a variable, and fitdist throws an error:

> fit_distt("mpg")
<simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data,     gr = gradient, ddistnam = ddistname, hessian = TRUE, method = meth,     lower = lower, upper = upper, ...): function cannot be evaluated at initial parameters>
 Error in fitdist(data = v, distr = "nbinom") : 
  the function mle failed to estimate the parameters, 
                with the error code 100 

This error occurred with f2 that tries to fit the nbinom on a continuous variable mpg. But the norm and gamma successfully fit.

I want to return the gofstat for the successfully fit distributions and ignore the ones that threw error.

Expected output

Even though f2 is specified in the function, if it throws an error, I still want the following output:

> fit_distt("mpg")
Goodness-of-fit statistics
                                 normal      gamma
Kolmogorov-Smirnov statistic 0.12485059 0.08841088
Cramer-von Mises statistic   0.08800019 0.03793323
Anderson-Darling statistic   0.58886727 0.28886166

Goodness-of-fit criteria
                                 normal    gamma
Akaike's Information Criterion 208.7555 205.8416
Bayesian Information Criterion 211.6870 208.7731

What I tried

Obviously, I can just remove f2 from the function. But that means repeating all the code for each variable. That's a lot of code! So, I still want to use the function.

And I want to be able to use the function for any variable. With mtcars$mpg, the function fails for nbinom, but with mtcars$vs, the function fails for gamma. For any case,I want to skip the fits that threw error and report gofstat for fits that worked.

I can use purrr::possibly to quietly return a fit result or throw the error without stopping at the error. But I don't know how to return the successfully fit values only in the gofstat.


Solution

  • You could try with try. Try to fit the distribution and only add it to the list you pass to gofstat if it works:

    library(fitdistrplus)
    #> Loading required package: MASS
    #> Loading required package: survival
    
    
    fit_distt <- function(var) {
      
      v <- mtcars[, var]
      
      distributions <- c("norm", "nbinom", "gamma")
      
      fs <- list()
      fitted_distributions <- vector(mode = "character")
      for (i in seq_along(distributions)) {
        # try to fit the model
        fit <- try(fitdist(data = v, distr = distributions[i]), silent = TRUE)    
        
        # if it works, add it to fs. If not, ¯\_(ツ)_/¯
        if (!inherits(fit, "try-error")) {
          fs[[length(fs)+1]] <- fit
          fitted_distributions[length(fitted_distributions)+1] <-  distributions[i]
        }
      }
      
      gofstat(f = fs,
              chisqbreaks = c(0, 3, 3.5, 4, 4.5, 
                              5, 10, 20, 30, 40),
              fitnames = fitted_distributions)
      
    }
    
    fit_distt("mpg")
    #> <simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data,     gr = gradient, ddistnam = ddistname, hessian = TRUE, method = meth,     lower = lower, upper = upper, ...): function cannot be evaluated at initial parameters>
    #> Goodness-of-fit statistics
    #>                                    norm      gamma
    #> Kolmogorov-Smirnov statistic 0.12485059 0.08841088
    #> Cramer-von Mises statistic   0.08800019 0.03793323
    #> Anderson-Darling statistic   0.58886727 0.28886166
    #> 
    #> Goodness-of-fit criteria
    #>                                    norm    gamma
    #> Akaike's Information Criterion 208.7555 205.8416
    #> Bayesian Information Criterion 211.6870 208.7731
    

    Created on 2020-10-07 by the reprex package (v0.3.0)