I analyse microbiome data using
library(phyloseq)
library(microbiome)
library(DirichletMultinomial)
and several other libraries. Fitting Dirichlet-Multinomial models to count data dmn {DirichletMultinomial}
takes quite a long time. Can the computation be run on multiple cpu cores in R.
I tried:
dat <- abundances(pseq)
count <- as.matrix(t(dat))
fit <- lapply(1:25, dmn, count = count, verbose=TRUE)
replacing with:
library(parallel)
numCores <- detectCores()
...
fit <- mclapply(1:25, dmn, count = count, verbose=TRUE, mc.cores = numCores)
but it returns errorWarning message: In mclapply(1:25, dmn, count = count, verbose = TRUE, mc.cores = numCores) : all scheduled cores encountered errors in user code
I am using
R version 4.0.2 (2020-06-22) -- "Taking Off Again"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin17.0 (64-bit)
> detectCores()
[1] 4
Can anyone help?
Best regards, Marcin
Yes, as illustrated in the vignette http://bioconductor.org/packages/release/bioc/vignettes/DirichletMultinomial/inst/doc/DirichletMultinomial.pdf section 2 and in your code it is possible to run on multiple cores.
Probably what is happening is that there are errors for some of the values of X; what is the value of fit? Also, one might try
library(BiocParallel)
fit <- bplapply(1:25, dmm, count, BPPARAM = MulticoreParam(numCores))
fit
will be an object that can be queried (see the BiocParallel vignette available from https://bioconductor.org/packages/BiocParallel) for more error information.