I am trying to use R function glmmTMB::glmmTMB()
to fit a generalized linear mixed-effects model to a dataset. However, I am being presented with a warning which reads Warning message: In (function (start, objective, gradient = NULL, hessian = NULL, : NA/NaN function evaluation
. I am unsure if this warning is important or how to address it.
Two notes:
glmmTMB
troubleshooting vignette (under "NA/NaN function evaluation" near the bottom of the page). I am unsure if these are the same warning given with slightly different text. Link: https://cran.r-project.org/web/packages/glmmTMB/vignettes/troubleshooting.htmlcontrol = glmmTMBControl(rank_check = "adjust")
) in my call to function glmmTMB()
has no effect on this warning. Link: Problem with glmmTMB function in R: gives NaN in summaryA reproducible example is provided below. File 'testing_insect_abund2.csv' can be downloaded from GitHub at this link: https://github.com/albrechtcf/SO_glmm_warning_data/blob/main/testing_insect_abund2.csv
These data are the abundances (integer counts) of a particular insect species (number/hectare) in a series of survey quadrats. Surveys were carried out in three study sites (i.e., three separate forested areas separated by a minimum of 40 kilometers). Within each study site, quadrats were arranged in trios associated with six individual forest canopy gaps (i.e., each of six canopy gaps per study site had three quadrats, for a total of 18 quadrats per site or 54 quadrats in total across all sites). Each row corresponds with one survey quadrat. Environmental data (elevation, forest canopy openness, and forest basal area) are included. Note that this is a dummy dataset in the format of my actual dataset - variable names and values have been edited for data security.
Variable definitions:
study_site
: the general areas in which surveys were carried out. Categorical variable with three levels.
canopy_gap_ID
: an identifier for each canopy gap. Categorical variable. Values are not necessarily unique (e.g., canopy gap ID = 2 is true for quadrats in both study site #1 and study site #3).
elevation
: elevation (m) of a quadrat relative to a baseline value. Negative values are not below sea level, just below an arbitary elevation treated as zero. Continuous variable.
canopy_openness
: a measure of how open the forest canopy is (in percent, with 0% being full closure and 100% being no canopy at all). Continuous variable.
ba_total
: forest basal area around a quadrat (square meters of tree cross-section per hectare). Integers.
abund
: species abundance (number/hectare). Integers.
Console:
> library(glmmTMB)
> library(tidyverse)
> dataIN <- read.csv("testing_insect_abund2.csv") %>%
+ dplyr::mutate(canopy_gap_ID = as.factor(canopy_gap_ID))
> head(dataIN)
study_site canopy_gap_ID elevation canopy_openness ba_total abund
1 studysite1 2 -19.134308 31.952001 20 0
2 studysite1 2 -33.968140 7.531575 80 0
3 studysite1 2 -52.965713 9.731771 50 257
4 studysite1 3 8.137726 16.040343 90 0
5 studysite1 3 -3.274323 7.346171 100 0
6 studysite1 3 -10.232712 6.932909 90 0
> glmmTMB(abund ~ elevation + canopy_openness + ba_total + elevation*canopy_openness + (1 | study_site:canopy_gap_ID),
+ family = poisson(link = "log"),
+ data = dataIN)
Formula: abund ~ elevation + canopy_openness + ba_total + elevation * canopy_openness + (1 | study_site:canopy_gap_ID)
Data: dataIN
AIC BIC logLik df.resid
7585.662 7597.596 -3786.831 48
Random-effects (co)variances:
Conditional model:
Groups Name Std.Dev.
study_site:canopy_gap_ID (Intercept) 12.99
Number of obs: 54 / Conditional model: study_site:canopy_gap_ID, 18
Fixed Effects:
Conditional model:
(Intercept) elevation canopy_openness ba_total elevation:canopy_openness
-1.368e+01 8.103e-03 6.163e-02 4.958e-02 4.093e-04
Warning message:
In (function (start, objective, gradient = NULL, hessian = NULL, :
NA/NaN function evaluation
Is it necessary to address this warning message? If so, how should I do so?
In general this warning is not something you need to worry about (although it's always good to see if there is some way to get rid of it, other than suppressWarnings()
, for clarity). It just means that at some point during the model-fitting process R tried a set of parameters that gave rise to an NaN
negative log-likelihood value. The fitting algorithms are robust enough that one (or a few) such instances won't break them.
You can often get rid of such warnings by scaling the predictor variables:
datasc <- datawizard::standardize(dataIN, exclude = "abund")
g2 <- update(g, data = datasc)
However, there are several issues to highlight about this model/data set (so maybe the warning is a good thing).
80% of your abundances are zero, and the rest range very widely (240 to 6920).
table(dataIN$abund)
0 240 242 244 248 250 257 732 990 994 1472 6920
43 1 1 1 1 1 1 1 1 1 1 1
To be blunt, you're probably not going to be able to get very much out of these data (regardless of how much effort they took to collect ...):