[SOLVED] R - "CAPdiscrim" and "lda" error "variable 1 appears to be constant within groups"

R - "CAPdiscrim" and "lda" error "variable 1 appears to be constant within groups"

so I've been trawling through existing questions for solutions to this one, but to no avail.

I have a dataset consisting of individuals (117), each with an observation from a different variable (12), and grouped by a factor variable with 8 levels.

I would like to do a canonical analysis of principal co-ordinates on these data based on the Anderson and Willis approach. I started by using BiodiversityR::CAPdiscrim. Let's start with some example data:

individual <- c(1:30)
group <- rep(c("a","b","c"), 10)
Var1 <- rnorm(n = 30, mean = 3.0e-4,sd = 2.0e-6)
Var2 <- rnorm(n = 30, mean = 2.4e-4,sd = 2.0e-6)
Var3 <- rnorm(n = 30, mean = 7.0e-6,sd = 9.0e-9)
Var4 <- rnorm(n = 30, mean = 4.2e-5,sd = 1.0e-6)
Var5 <- rnorm(n = 30, mean = 1.0e-4,sd = 9.0e-6)
Var6 <- rnorm(n = 30, mean = 8.0e-5,sd = 1.0e-5)

df <- data.frame(cbind(individual, group, Var1, Var2, Var3, Var4, Var5, Var6))
df$Var1 <- as.numeric(levels(df$Var1))[as.integer(df$Var1)]
df$Var2 <- as.numeric(levels(df$Var2))[as.integer(df$Var2)]
df$Var3 <- as.numeric(levels(df$Var3))[as.integer(df$Var3)]
df$Var4 <- as.numeric(levels(df$Var4))[as.integer(df$Var4)]
df$Var5 <- as.numeric(levels(df$Var5))[as.integer(df$Var5)]
df$Var6 <- as.numeric(levels(df$Var6))[as.integer(df$Var6)]

CAPdiscrim requires data in a particular format:

vars <- df[3:8]

now we can run CAPdiscrim on the data

BiodiversityR::CAPdiscrim(vars~group,
                          data = df,
                          dist = "euclidean",
                          axes = 4,
                          m = 0,
                          permutations = 999)

Which returns:

Error in lda.default(x, grouping, ...) : variable 1 appears to be constant within groups

We can use nearZeroVar to see if this is true (which is appears not to be true):

vars_check <- nearZeroVar(vars, saveMetrics = TRUE, names = TRUE)
vars_check

    freqRatio percentUnique zeroVar   nzv
Var1         1           100   FALSE FALSE
Var2         1           100   FALSE FALSE
Var3         1           100   FALSE FALSE
Var4         1           100   FALSE FALSE
Var5         1           100   FALSE FALSE
Var6         1           100   FALSE FALSE

Now I saw other questions regarding this error specific to lda() and I noticed that CAPdiscrim() calls vegdist(), cmdscale() and lda() so I tried to break down this analysis peice by peice:

dist_matrix <- vegdist(vars,
                       method = "euclidean",
                       binary = FALSE,
                       diag = FALSE,
                       upper = FALSE,
                       na.rm = TRUE)

PCA_vars <- cmdscale(d = dist_matrix,
                       k = 5,
                       eig = TRUE,
                       add = FALSE,
                       x.ret = FALSE)

LDA_pldist <- lda(x = PCA_vars$points,
                  grouping = df$group)

Which returns a very similar result:

Error in lda.default(x, grouping, ...) : variables 1 2 3 4 5 appear to be constant within groups

Now in lda() there is an argument "tol" which can be used to remove this error when dealing with very small numbers, so I can do this:

LDA_pldist <- lda(x = PCA_vars$points,
                  grouping = df$group,
                  tol = 1.0e-25)

This provides some output, but doesn't include some of the features of CAPdiscrim such as allowing the function to determine the best number for "m" through permutations.

Can anyone suggest how to modify the tolerance in CAPdiscrim()? or how to carry out what CAPdiscrim() is doing under the hood manually with these other functions?

Any insight would be greatly appreciated.

Solution

The Author of BiodiversityR::CAPdiscrim has fixed the problem and this has been rolled out in subsequent package updates. It was a case of some error-checks relying on absolute values making sense from an ecology perspective vs relative values compared to the input data.