pythonresourcesslurmsnakemake

Setting resources dynamically on snakemake


Context
I am running a snakemake (v.7.32.4) pipeline using slurm task manager. I have set resources (time and memory for each rule) dynamically based on file size and # of tries, like this:

rule index:
    resources:
        mem_mb = lambda wildcards, input, attempt: (
            200 * attempt
        ),
        runtime = lambda wildcards, input, attempt: (
            "{minutes}min".format(
                minutes=max(
                    int((input.size_mb / 5000) * attempt),
                    1)
            )
        )

I have 2 related questions (should I split the post?):
1) Is it possible to set resources dinamically outside of snakefile? I tried to set that on the profile config file but didn't success (sometime ago, so cannot say exactly what I tried)

2) Having set resources dinamically inside Snakefile, how do I do a dry run or a rulegraph?
If I run a dry-run I get the following error:

WorkflowError:
Cannot parse runtime value into minutes for setting runtime resource: <TBD>

This seems logical to me since the file doesn't exist yet. Nevertheless, I would like to know the specifics of all steps (except resources of course) to be run before actually running them, is this possible?
Something similar happens if I try to do a rulegraph:

snakemake --profile Config/Profiles/slurm -np --rulegraph | dot -Tsvg > rulegraph.svg

In this case, I get an empty file (probably because of the error in the dry run?).


Solution

  • 1) Is it possible to set resources dinamically outside of snakefile?

    Depends on the snakemake version. For v8.14.0, yes. For version 7.32.4, sort of. See both scenarios below.

    Snakemake version 7.34.4

    You can call a function on snakefile and declare the function elsewhere, as proposed by @SultanOrazbayev:
    Snakefile:

    from resources import get_runtime
    
    rule index:  
        resources:  
            mem_mb = get_mem_mb    
    

    resources.py:

    def get_runtime(wildcards: dict, input: str|list[str], attempt: int) -> str:
       try:
          minutes = max(int((input.size_mb / 5_000) * attempt), 1)
       except FileNotFoundError:
          minutes = 10  # this is some test value
       return  "{minutes}min"
    

    Snakemake version 8.14.0

    Install the slurm plug in:

    pip install snakemake-executor-plugin-slurm
    

    Then you can specify resources dinamically (entirely) on your workflow profile config.yaml file:

    executor: slurm  
    set-resources:
        index:
          runtime: f"{max(int((input.size_mb / 5_000) * attempt), 1)}min"
    

    NOTE: I only tried snakemake v8 using the slurm plugin. And I found this solution on the slurm plugin documentation. Hence, I don't know if using the workflow profile as described above would work without the slurm plugin or not.

    2) Having set resources dynamically inside Snakefile, how do I do a dry run or a rulegraph?

    This question only applies to snakemake version 7 or lower. On snakemake v8 (I specifically tried v8.14.0) dry runs and graphs works fine, even if file does not exist yet.
    As for snakemake v7.32.4, one way of solving it is by handling the error as described above (proposed by @SultanOrazbayev).