I have a set of genes for which I need to calculate some coefficients in parallel.
Coefficients are calculated inside GeneTo_GeneCoeffs_filtered
that takes gene name as an input and returns the list of 2 data frames.
Having 100-length gene_array
I ran this command with the different number of cores: 5, 6 and 7.
Coeffslist=mclapply(gene_array,GeneTo_GeneCoeffs_filtered,mc.cores = no_cores)
I encounter errors on different gene names depending on the number of cores assigned to mclapply
.
Indexes of genes on which GeneTo_GeneCoeffs_filtered
cannot return the list of data frames they have a pattern.
In the case of 7 cores assigned to mclapply, it is 4, 11, 18, 25, ... 95 elements of gene_array
(every 7th), and when R works with 6 cores indexes are 2, 8, 14,..., 98 (every 6th) and the same way with 5 cores - every 5th.
The most important thing is that they are different for these processes and it means that the problem is not in particular genes.
I suspect there might be "broken" core that cannot properly run my functions and only it generates this errors. Is there a way to trace back its id and exclude it from the list of cores that can be used by R?
A close reading of mclapply's manpage reveals that this behavior is by design and it arises as result of interaction between:
(a)
"the input X is split into as many parts as there are cores (currently the values are spread across the cores sequentially, i.e. first value to core 1, second to core 2, ... (core + 1)-th value to core 1 etc.) and then one process is forked to each core and the results are collected."
(b)
a "try-error" object will be returned for all the values involved in the failure, even if not all of them failed.
In your case, by virtue of (a), your gene_array is spread "round-robin" style across the cores (with a gap of mc.cores between the indexes of successive elements), and by virtue of (b), if any gene_array element raises an error, you get back an error for each gene_array element sent to that core (having a gap of mc.cores between the indices of those elements).
I refreshed my understanding of this in an exchange yesterday with Simon Urbanek: https://stat.ethz.ch/pipermail/r-sig-hpc/2019-September/002098.html in which I also provide an error-handling approach yielding errors only for the indices that generate an error.
You can also get errors only for the indices that generate an error by passing mc.preschedule=FALSE
.