snakemakeqsub

Is it possible to see qsub job scripts and command line options etc. that will be issued by Snakemake in the dry run mode?


I am new to Snakemake and planning to use Snakemake for qsub in my cluster environment. In order to avoid critical mistakes that may disturb the cluster, I would like to check qsub job scripts and a qsub command that will be generated by Snakemake before actually submitting the jobs to the que.

Is it possible to see qsub job script files etc. in the dry run mode or in some other ways? I searched for relevant questions but could not find the answer. Thank you for your kind help.

Best,

Schuma


Solution

  • Using --printshellcmds or short version -p with --dry-run will allow you to see the commands snakemake will feed to qsub, but you won't see qsub options.

    I don't know any option showing which parameters are given to qsub, but snakemake follows a simple set of rules, which you can find detailed information here and here. As you'll see, you can feed arguments to qsub in multiple ways.

    Suppose you have the following pipeline:

    rule all:
        input:
            input.txt
        output:
            output.txt
        resources:
            mem_mb=2000
            runtime_min=240
        shell:
            """
            some_command {input} {output}
            """
    

    If you call qsub using the --cluster directive, you can access all keywords of your rules. Your command could then look like this:

    snakemake all --cluster "qsub --runtime {resources.runtime} -l mem={resources.mem_mb}mb"

    This means snakemake will submit the following script to the cluster just as if you did directly in your command line:

    qsub --runtime 240 -l mem=2000mb some_command input.txt output.txt

    It is up to you to see which parameters you define where. You might want to check your cluster's documentation or with its administrator what parameters are required and what to avoid.

    Also note that for cluster use, Snakemake documentation recommends setting up a profile which you can then use with snakemake --profile myprofile instead of having to specify arguments and default values each time.

    Such a profile can be written in a ~/.config/snakemake/profile_name/config.yaml file. Here is an example of such a profile:

    cluster: "qsub -l mem={resources.mem_mb}mb other_resource={resources.other_resource_name}"
    jobs: 256
    printshellcmds: true
    rerun-incomplete: true
    default-resources:
      - mem_mb=1000
      - other_resource_name="foo"
    

    Invoking snakemake all --profile profile_name corresponds to invoking

    snakemake all --cluster "qsub -l mem={resources.mem_mb}mb other_resource= resources.other_resource_name_in_snakefile}" --jobs 256 --printshellcmds --rerun-incomplete --default-resources mem_mb=1000 other_resource_name "foo"
    

    You may also want to define test rules, like a minimal example of your pipeline for instance, and try these first to verify all goes well before running your full pipeline.