[SOLVED] Is it possible to see qsub job scripts and command line options etc. that will be issued by Snakemake in the dry run mode?

Is it possible to see qsub job scripts and command line options etc. that will be issued by Snakemake in the dry run mode?

I am new to Snakemake and planning to use Snakemake for qsub in my cluster environment. In order to avoid critical mistakes that may disturb the cluster, I would like to check qsub job scripts and a qsub command that will be generated by Snakemake before actually submitting the jobs to the que.

Is it possible to see qsub job script files etc. in the dry run mode or in some other ways? I searched for relevant questions but could not find the answer. Thank you for your kind help.

Best,

Schuma

Solution

Using --printshellcmds or short version -p with --dry-run will allow you to see the commands snakemake will feed to qsub, but you won't see qsub options.

I don't know any option showing which parameters are given to qsub, but snakemake follows a simple set of rules, which you can find detailed information here and here. As you'll see, you can feed arguments to qsub in multiple ways.

With default values --default-resources resource_name1=<value1> resource_name2=<value2> when invoking snakemake.
On a per-rule basis, using resources in rules (prioritized over default values).
With explicitly set values, either for the whole pipeline using --set-resources resource_name1=<value1> or for a specific rule using --set-resources rule_name:resource_name1=<value1> (prioritized over default and per-rule values)

Suppose you have the following pipeline:

rule all:
    input:
        input.txt
    output:
        output.txt
    resources:
        mem_mb=2000
        runtime_min=240
    shell:
        """
        some_command {input} {output}
        """

If you call qsub using the --cluster directive, you can access all keywords of your rules. Your command could then look like this:

snakemake all --cluster "qsub --runtime {resources.runtime} -l mem={resources.mem_mb}mb"

This means snakemake will submit the following script to the cluster just as if you did directly in your command line:

qsub --runtime 240 -l mem=2000mb some_command input.txt output.txt

It is up to you to see which parameters you define where. You might want to check your cluster's documentation or with its administrator what parameters are required and what to avoid.

Also note that for cluster use, Snakemake documentation recommends setting up a profile which you can then use with snakemake --profile myprofile instead of having to specify arguments and default values each time.

Such a profile can be written in a ~/.config/snakemake/profile_name/config.yaml file. Here is an example of such a profile:

cluster: "qsub -l mem={resources.mem_mb}mb other_resource={resources.other_resource_name}"
jobs: 256
printshellcmds: true
rerun-incomplete: true
default-resources:
  - mem_mb=1000
  - other_resource_name="foo"

Invoking snakemake all --profile profile_name corresponds to invoking

snakemake all --cluster "qsub -l mem={resources.mem_mb}mb other_resource= resources.other_resource_name_in_snakefile}" --jobs 256 --printshellcmds --rerun-incomplete --default-resources mem_mb=1000 other_resource_name "foo"

You may also want to define test rules, like a minimal example of your pipeline for instance, and try these first to verify all goes well before running your full pipeline.