rfunctionregressionformula

Error in str2lang(x) trying to use nls() within a function


I am trying to fit an exponential decay curve to some biological data using nls() within a function (I want to do this multiple times).

I receive an error with the way I am specifying nls() which I can't work out:

Error in str2lang(x) : <text>:2:0: unexpected end of input
1: ~ 
   ^

A small dummy dataset and the code I have written is below - any help would be greatly appreciated.

dat_small <-  structure(list(actual_time = c(5, 9, 30, 59, 119, 171, 216), 
                             activity = c(7158, 7386, 5496, 3884, 1502, 819, 409)), row.names = c(NA, 
                                                                                                  -7L), class = c("tbl_df", "tbl", "data.frame"))
curve_fit <-  function(x, y, df) {
  #browser()
  # Mono-exponential model for activity using actual sampling time
  mod <-  lm(log(df[[y]]) ~ df[[x]], data = df) # get starting values from log-linear model
  C0 <-  as.numeric(exp(mod$coef[1])) # exponential of log-linear intercept
  lambda1 <-  as.numeric(abs(mod$coef[2])) # absolute value of slope of log-linear model
  nls_mod_mono <-  nls(df[[y]] ~ C0*exp(-lambda1*df[[x]]), start = c(C0 = C0, lambda1 = lambda1), data = df)
  summary(nls_mod_mono) # estimate model
  xNew <- seq(0, 240, length.out = 100) # new grid of times
  yNew <- predict(nls_mod_mono, list(x = xNew)) # predict activity
  dfNew <-  data.frame(x = xNew, y = yNew)
  p_mono <-  ggplot(dfNew, aes(x = x, y = y)) +
    geom_line() +
    geom_point(data = df, aes(x = x, y = y), size = 3) +
    xlab("Actual Sampling Time (mins)") + ylab("Activity (Bq)") +
    theme_bw(base_size = 20)
  mono_half_life <-  0.693/as.numeric(coef(nls_mod_mono)[2]) # half-life
  mono_auc <-  as.numeric(coef(nls_mod_mono)[1])/as.numeric(coef(nls_mod_mono)[2]) # AUC to infinity (can be estimated in mono case by simply dividing the intercept by lambda)
  results <-  c(p_mono, mono_half_life, mono_auc)
}
curve_fit("actual_time", "activity", dat_small)

Solution

  • Questions to SO should have minimal code focused on the problem at hand. Here the question is how to avoid the error in the nls call so everything after that is not relevant and we omit it. (We do point out, however, that aes in ggplot2 accepts unquoted variables and since x and y are character write it like this aes(.data[[x]], .data[[y]]) .)

    1) An easy fix is probably just to avoid the problem by defining a data frame with fixed x and y column names and then use that.

    curve_fit1 <- function(x, y, df) {
      data <- data.frame(x = df[[x]], y = df[[y]])
      co <- coef(lm(log(y) ~ x, data))
      st <- c(C0 = exp(co[[1]]), lambda1 = -co[[2]])
      nls(y ~ C0 * exp(-lambda1 * x), data = data, start = st) 
    }
    
    curve_fit1("actual_time", "activity", dat_small)
    

    giving

    Nonlinear regression model
      model: y ~ C0 * exp(-lambda1 * x)
       data: data
           C0   lambda1 
    8.003e+03 1.301e-02 
     residual sum-of-squares: 270020
    
    Number of iterations to convergence: 4 
    Achieved convergence tolerance: 4.228e-06
    

    2) Even easier is to use the NLS.expoDecay self starting function in the statforbiology package in which case no starting value is needed. The output is as in (1).

    library(statforbiology)
    
    curve_fit2 <- function(x, y, df) {
      data <- data.frame(x = df[[x]], y = df[[y]])
      nls(y ~ NLS.expoDecay(x, C0, lambda1), data)
    }
    
    curve_fit2("actual_time", "activity", dat_small)
    

    3) We could alternately use substitute to substitute the variables into the formula. This can apply to either (1) or (2). We will show it as applied to (2).

    library(statforbiology)
    
    curve_fit3 <- function(x, y, df) {
      fo <- substitute(y ~ NLS.expoDecay(x, C0, lambda1), 
        list(x = as.name(x), y = as.name(y)))
      nls(fo, df)
    }
    
    curve_fit3("actual_time", "activity", dat_small)
    

    giving

    Nonlinear regression model
      model: activity ~ NLS.expoDecay(actual_time, C0, lambda1)
       data: df
           C0   lambda1 
    8.003e+03 1.301e-02 
     residual sum-of-squares: 270020
    
    Number of iterations to convergence: 4 
    Achieved convergence tolerance: 4.228e-06
    

    4) Yet another altertnative is to use the drc package together with DRC.exponDecay from statforbiology. In this case the simple form of formula accepted allows us to use reformulate to generate it and furthermore the drc class object produced from drm has coef, plot and other methods -- try methods(class = "drc") .

    library(drc)
    library(statforbiology)
    
    curve_fit4 <- function(x, y, df) {
      drm(reformulate(x, y), data = df, fct = DRC.expoDecay())
    }
    
    res <- curve_fit4("actual_time", "activity", dat_small)
    plot(res)
    

    screenshot