so I've been trawling through existing questions for solutions to this one, but to no avail.
I have a dataset consisting of individuals (117), each with an observation from a different variable (12), and grouped by a factor variable with 8 levels.
I would like to do a canonical analysis of principal co-ordinates on these data based on the Anderson and Willis approach. I started by using BiodiversityR::CAPdiscrim. Let's start with some example data:
individual <- c(1:30)
group <- rep(c("a","b","c"), 10)
Var1 <- rnorm(n = 30, mean = 3.0e-4,sd = 2.0e-6)
Var2 <- rnorm(n = 30, mean = 2.4e-4,sd = 2.0e-6)
Var3 <- rnorm(n = 30, mean = 7.0e-6,sd = 9.0e-9)
Var4 <- rnorm(n = 30, mean = 4.2e-5,sd = 1.0e-6)
Var5 <- rnorm(n = 30, mean = 1.0e-4,sd = 9.0e-6)
Var6 <- rnorm(n = 30, mean = 8.0e-5,sd = 1.0e-5)
df <- data.frame(cbind(individual, group, Var1, Var2, Var3, Var4, Var5, Var6))
df$Var1 <- as.numeric(levels(df$Var1))[as.integer(df$Var1)]
df$Var2 <- as.numeric(levels(df$Var2))[as.integer(df$Var2)]
df$Var3 <- as.numeric(levels(df$Var3))[as.integer(df$Var3)]
df$Var4 <- as.numeric(levels(df$Var4))[as.integer(df$Var4)]
df$Var5 <- as.numeric(levels(df$Var5))[as.integer(df$Var5)]
df$Var6 <- as.numeric(levels(df$Var6))[as.integer(df$Var6)]
CAPdiscrim requires data in a particular format:
vars <- df[3:8]
now we can run CAPdiscrim on the data
BiodiversityR::CAPdiscrim(vars~group,
data = df,
dist = "euclidean",
axes = 4,
m = 0,
permutations = 999)
Which returns:
Error in lda.default(x, grouping, ...) : variable 1 appears to be constant within groups
We can use nearZeroVar to see if this is true (which is appears not to be true):
vars_check <- nearZeroVar(vars, saveMetrics = TRUE, names = TRUE)
vars_check
freqRatio percentUnique zeroVar nzv
Var1 1 100 FALSE FALSE
Var2 1 100 FALSE FALSE
Var3 1 100 FALSE FALSE
Var4 1 100 FALSE FALSE
Var5 1 100 FALSE FALSE
Var6 1 100 FALSE FALSE
Now I saw other questions regarding this error specific to lda() and I noticed that CAPdiscrim() calls vegdist(), cmdscale() and lda() so I tried to break down this analysis peice by peice:
dist_matrix <- vegdist(vars,
method = "euclidean",
binary = FALSE,
diag = FALSE,
upper = FALSE,
na.rm = TRUE)
PCA_vars <- cmdscale(d = dist_matrix,
k = 5,
eig = TRUE,
add = FALSE,
x.ret = FALSE)
LDA_pldist <- lda(x = PCA_vars$points,
grouping = df$group)
Which returns a very similar result:
Error in lda.default(x, grouping, ...) : variables 1 2 3 4 5 appear to be constant within groups
Now in lda()
there is an argument "tol" which can be used to remove this error when dealing with very small numbers, so I can do this:
LDA_pldist <- lda(x = PCA_vars$points,
grouping = df$group,
tol = 1.0e-25)
This provides some output, but doesn't include some of the features of CAPdiscrim
such as allowing the function to determine the best number for "m" through permutations.
Can anyone suggest how to modify the tolerance in CAPdiscrim()
? or how to carry out what CAPdiscrim()
is doing under the hood manually with these other functions?
Any insight would be greatly appreciated.
The Author of BiodiversityR::CAPdiscrim has fixed the problem and this has been rolled out in subsequent package updates. It was a case of some error-checks relying on absolute values making sense from an ecology perspective vs relative values compared to the input data.