memory requirements can be defined per rule in the resources
directive
rule spades:
input:
rules.aRule.output
output:
"{sample}/spades/contigs.fasta"
resources:
mem_mb = 112000
shell:
"spades {input} {output}"
When it comes to programs where you can specify memory requirements directly via a command-line parameter (e.g. spades), what would be the difference between specifying the memory in the resources
directive, as above, and specifying the memory with the command-line parameter of spades
itself i.e.
rule spades_mem:
input:
rules.aRule.output
output:
"{sample}/spades/contigs.fasta"
params:
mem_spades = 112000
shell:
"spades -m {params.mem_spades} {input} {output}"
It is probably not a good idea to specify memory with both ways i.e.
rule spades_both:
input:
rules.aRule.output
output:
"{sample}/spades/contigs.fasta"
params:
mem_spades = 112000
resources:
mem_mb = 112000
shell:
"spades -m {params.mem_spades} {input} {output}"
however, if I do so, which one takes preference, the one from the binary command-line (rule spades_mem
) parameter, or the one specified in the resources
directive (rule spades
)?
The resources
directive will be used by snakemake
when requesting a suitable compute node. The command-line specification of memory constraint does not guarantee that the specified amount is actually available on the compute node.
Specifying the memory constraint in both resources
and params
can make sense in certain scenarios. For example, if the application is greedy and consumes all available memory unless the memory constraint is explicitly specified, then it might be desirable to specify params
to ensure that the application doesn't consume all the memory when running locally.
When both resources
and params
specify memory requirement, the resources
will be used when requesting a suitable compute node (if running locally, resources would determine the set of jobs that can execute simultaneously), while params
would be passed directly to the application (assuming you specify it as in your example).