if-statementsnakemake

Best practices for integrating optional arguments in shell command inside a snakemake pipeline


I was wondering what the cleanest way to run two different shell commands depending on if a parameter is given in a snakemake config file. For the moment I am using this setup:


rule deeptools_bamCoverage_pe:
    input:
        bam="DATA/BAM/{sample_sp}_pe.bam",
        bai="DATA/BAM/{sample_sp}_pe.bam.bai"
    output:
        "DATA/BIGWIG/{sample_sp}_pe_RPGC.bw"
    log:
        "snakemake_logs/deeptools_bamCoverage/{sample_sp}_pe.log"
    run:
        if config["excluded_regions"]:
            shell("bamCoverage --normalizeUsing RPGC -bl "+config["excluded_regions"]+
                    " --effectiveGenomeSize $((2913022398-"+config["excluded_regions"].split(".")[0].split("_")[-1]+")) -e -b {input.bam} -o {output} 2>{log}")
        else:
            shell("bamCoverage --normalizeUsing RPGC --effectiveGenomeSize 2913022398 -e -b {input.bam} -o {output} 2>{log}")


My config file may or may not contain excluded_regions: DATA/GENOMES/DAC_excl_HG38_71570285.bed depending on if the user wants to filter out some regions from the analysis. 71570285 is the total length of the excluded regions, which is needed for the normalisation calculations (this could be passed in the config file too but I am trying to keep it light to not scare away my potential users).

Is there a cleaner way to present snakemake with two different shell lines that it can run?


Solution

  • In general, not really. If the commands are completely different you are going to need an if/else statement. You could put the if/else statement within the shell code rather than using Python logic but that doesn't change much.

    I'd say in this case the two commands are actually very similar and this version looks neater to me:

    rule deeptools_bamCoverage_pe:
        input:
            bam="DATA/BAM/{sample_sp}_pe.bam",
            bai="DATA/BAM/{sample_sp}_pe.bam.bai"
        output:
            "DATA/BIGWIG/{sample_sp}_pe_RPGC.bw"
        log:
            "snakemake_logs/deeptools_bamCoverage/{sample_sp}_pe.log"
        params:
            excluded_regions = config["excluded_regions"],
            genome_size = 2913022398,
        run:
            if params.excluded_regions:
                bl_file = "-bl " + config["excluded_regions"]
                excluded_size = int(config["excluded_regions"].split(".")[0].split("_")[-1])
                effective_genome_size = params.genome_size - excluded_size
            else:
                bl_file = ""
                effective_genome_size = params.genome_size
    
            shell("bamCoverage --normalizeUsing RPGC {bl_file} --effectiveGenomeSize {effective_genome_size} -e -b {input.bam} -o {output} 2>{log}")
    

    But you if you find this less legible then stick with what you have - you're the one who has to debug/maintain your code.