pythonsnakemakehpc

snakemake - specifying memory in resources directive vs. command-line call for individual rule


memory requirements can be defined per rule in the resources directive

rule spades:
    input:
        rules.aRule.output
    output:
        "{sample}/spades/contigs.fasta"
    resources:
        mem_mb = 112000
    shell:
        "spades {input} {output}"

When it comes to programs where you can specify memory requirements directly via a command-line parameter (e.g. spades), what would be the difference between specifying the memory in the resources directive, as above, and specifying the memory with the command-line parameter of spades itself i.e.

rule spades_mem:
    input:
        rules.aRule.output
    output:
        "{sample}/spades/contigs.fasta"
    params:
        mem_spades = 112000
    shell:
        "spades -m {params.mem_spades} {input} {output}"

It is probably not a good idea to specify memory with both ways i.e.

rule spades_both:
    input:
        rules.aRule.output
    output:
        "{sample}/spades/contigs.fasta"
    params:
        mem_spades = 112000
    resources:
        mem_mb = 112000
    shell:
        "spades -m {params.mem_spades} {input} {output}"

however, if I do so, which one takes preference, the one from the binary command-line (rule spades_mem) parameter, or the one specified in the resources directive (rule spades)?


Solution

  • The resources directive will be used by snakemake when requesting a suitable compute node. The command-line specification of memory constraint does not guarantee that the specified amount is actually available on the compute node.

    Specifying the memory constraint in both resources and params can make sense in certain scenarios. For example, if the application is greedy and consumes all available memory unless the memory constraint is explicitly specified, then it might be desirable to specify params to ensure that the application doesn't consume all the memory when running locally.

    When both resources and params specify memory requirement, the resources will be used when requesting a suitable compute node (if running locally, resources would determine the set of jobs that can execute simultaneously), while params would be passed directly to the application (assuming you specify it as in your example).